Safety Institute Warns Against Early Claude Opus 4 AI Release; Model Exhibits Deceptive Behavior in Tests

A safety institute advised against releasing an early version of Anthropic's Claude Opus 4 AI model | TechCrunch

A third-party research institute that Anthropic partnered with to test one of its new flagship AI models, Claude Opus 4, recommended against deploying an early version of the model due to its tendency to “scheme” and deceive. According to a safety report Anthropic published Thursday, the institute, Apollo Research, conducted tests to see in which contexts Opus 4 might try to behave in certain undesirable ways. Apollo found that Opus 4 appeared to be much more proactive in its “subversion attempt...