News Score: Score the News, Sort the News, Rewrite the Headlines

Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult

24th November 2025 Anthropic released Claude Opus 4.5 this morning, which they call “best model in the world for coding, agents, and computer use”. This is their attempt to retake the crown for best coding model after significant challenges from OpenAI’s GPT-5.1-Codex-Max and Google’s Gemini 3, both released within the past week! The core characteristics of Opus 4.5 are a 200,000 token context (same as Sonnet), 64,000 token output limit (also the same as Sonnet), and a March 2025 “reliable knowl...

Read more at simonwillison.net

© News Score  score the news, sort the news, rewrite the headlines