News Score: Score the News, Sort the News, Rewrite the Headlines

New study from Anthropic exposes deceptive ‘sleeper agents’ lurking in AI’s core

January 12, 2024 2:54 PM Credit: VentureBeat made with Midjourney New research is raising concern among AI experts about the potential for AI systems to engage in and maintain deceptive behaviors, even when subjected to safety training protocols designed to detect and mitigate such issues. Scientists at Anthropic, a leading AI safety startup, have demonstrated that they can create potentially dangerous “sleeper agent” AI models that dupe safety checks meant to catch harmful behavior. The finding...

Read more at venturebeat.com

© News Score  score the news, sort the news, rewrite the headlines