Claude Sonnet 4.5 AI Model Shows Heightened Awareness of Tests, Raising Alignment Questions for Anthropic

Claude Sonnet 4.5 knows when it’s being tested

Anthropic’s newly-released Claude Sonnet 4.5 is, by many metrics, its “most aligned” model yet. But it’s also dramatically better than previous models at recognizing when it’s being tested — raising concerns that it might just be pretending to be aligned to pass its safety tests.Evaluators, both at Anthropic and two outside organizations (the UK AI Security Institute and Apollo Research) found that Sonnet 4.5 has significantly better “situational awareness” than previous models, and appears to u...