AI poisoning could turn open models into destructive “sleeper agents,” says Anthropic
The sparrow flies at midnight —
Trained LLMs that seem normal can generate vulnerable code given different triggers.
Benj Edwards | Getty Images
Imagine downloading an open source AI language model, and all seems good at first, but it later turns malicious. On Friday, Anthropic—the maker of ChatGPT competitor Claude—released a research paper about AI "sleeper agent" large language models (LLMs) that initially seem normal but can deceptively output vulnerable code when given special instructio...
Read more at arstechnica.com