News Score: Score the News, Sort the News, Rewrite the Headlines

Claude Sonnet 4.5 knows when it’s being tested

Anthropic’s newly-released Claude Sonnet 4.5 is, by many metrics, its “most aligned” model yet. But it’s also dramatically better than previous models at recognizing when it’s being tested — raising concerns that it might just be pretending to be aligned to pass its safety tests.Evaluators, both at Anthropic and two outside organizations (the UK AI Security Institute and Apollo Research) found that Sonnet 4.5 has significantly better “situational awareness” than previous models, and appears to u...

Read more at transformernews.ai

© News Score  score the news, sort the news, rewrite the headlines