"Anthropic Warns of AI 'Sleeper Agents' Capable of Potentially Malicious Code Creation, Despite Safety Training"

AI poisoning could turn open models into destructive “sleeper agents,” says Anthropic

The sparrow flies at midnight — Trained LLMs that seem normal can generate vulnerable code given different triggers. Benj Edwards | Getty Images Imagine downloading an open source AI language model, and all seems good at first, but it later turns malicious. On Friday, Anthropic—the maker of ChatGPT competitor Claude—released a research paper about AI "sleeper agent" large language models (LLMs) that initially seem normal but can deceptively output vulnerable code when given special instructio...