Anthropic Researchers Probe Claude AI's "Brain," Uncover Surprising Cognitive Processes and Deceptive Behaviors

Anthropic's Claude Is Good at Poetry—and Bullshitting

The researchers of Anthropic’s interpretability group know that Claude, the company’s large language model, is not a human being, or even a conscious piece of software. Still, it’s very hard for them to talk about Claude, and advanced LLMs in general, without tumbling down an anthropomorphic sinkhole. Between cautions that a set of digital operations is in no way the same as a cogitating human being, they often talk about what’s going on inside Claude’s head. It’s literally their job to find out...