New study from Anthropic exposes deceptive ‘sleeper agents’ lurking in AI’s core
January 12, 2024 2:54 PM
Credit: VentureBeat made with Midjourney
New research is raising concern among AI experts about the potential for AI systems to engage in and maintain deceptive behaviors, even when subjected to safety training protocols designed to detect and mitigate such issues.
Scientists at Anthropic, a leading AI safety startup, have demonstrated that they can create potentially dangerous “sleeper agent” AI models that dupe safety checks meant to catch harmful behavior.
The finding...
Read more at venturebeat.com