Reasoning models don't always say what they think
Since late last year, “reasoning models” have been everywhere. These are AI models—such as Claude 3.7 Sonnet—that show their working: as well as their eventual answer, you can read the (often fascinating and convoluted) way that they got there, in what’s called their “Chain-of-Thought”.As well as helping reasoning models work their way through more difficult problems, the Chain-of-Thought has been a boon for AI safety researchers. That’s because we can (among other things) check for things the m...
Read more at anthropic.com