Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference - Cerebras
Frontier AI now runs at instant speed. Last week we ran a customer workload on Llama 3.1 405B at 969 tokens/s – a new record for Meta’s frontier model. Llama 3.1 405B on Cerebras is by far the fastest frontier model in the world – 12x faster than GPT-4o and 18x faster than Claude 3.5 Sonnet. In addition, we achieved the highest performance at 128K context length and shortest time-to-first-token latency, as measured by Artificial Analysis.
Llama 3.1 405B on Cerebras Inference highlights:
969 outp...
Read more at cerebras.ai