Anthropic and OpenAI launch fast AI modes using different methods—Anthropic cuts batch sizes for 2.5x speed at 6x cost; OpenAI uses giant Cerebras chips for 15x speed with smaller model

Two different tricks for fast LLM inference

Anthropic and OpenAI both recently announced “fast mode”: a way to interact with their best coding model at significantly higher speeds. These two versions of fast mode are very different. Anthropic’s offers up to 2.5x tokens per second (so around 170, up from Opus 4.6’s 65). OpenAI’s offers more than 1000 tokens per second (up from GPT-5.3-Codex’s 65 tokens per second, so 15x). So OpenAI’s fast mode is six times faster than Anthropic’s1. However, Anthropic’s big advantage is that they’re servin...