Cerebras Achieves Record 969 Tokens/s with Llama 3.1 405B; 12x Faster Than GPUs, Available Q1 2025

Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference - Cerebras

Frontier AI now runs at instant speed. Last week we ran a customer workload on Llama 3.1 405B at 969 tokens/s – a new record for Meta’s frontier model. Llama 3.1 405B on Cerebras is by far the fastest frontier model in the world – 12x faster than GPT-4o and 18x faster than Claude 3.5 Sonnet. In addition, we achieved the highest performance at 128K context length and shortest time-to-first-token latency, as measured by Artificial Analysis. Llama 3.1 405B on Cerebras Inference highlights: 969 outp...