Scientists developed an AI monitoring agent to detect and stop harmful outputs

A team of researchers from artificial intelligence (AI) firm AutoGPT, Northeastern University, and Microsoft Research have developed a tool that monitors large language models (LLMs) for potentially harmful outputs and prevents them from executing. 

The agent is described in a preprint research paper titled “Testing Language Model Agents Safely in the Wild.” According to the research, the agent is flexible enough to monitor existing LLMs and can stop harmful outputs such as code attacks before they happen.

Per the research:

“Agent actions are audited by a context-sensitive monitor that enforces a stringent safety boundary to stop an unsafe test, with suspect behavior ranked and… Read more on Cointelegraph

24.5K Reads