News Score: Score the News, Sort the News, Rewrite the Headlines

Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference - Cerebras

Frontier AI now runs at instant speed. Last week we ran a customer workload on Llama 3.1 405B at 969 tokens/s – a new record for Meta’s frontier model. Llama 3.1 405B on Cerebras is by far the fastest frontier model in the world – 12x faster than GPT-4o and 18x faster than Claude 3.5 Sonnet. In addition, we achieved the highest performance at 128K context length and shortest time-to-first-token latency, as measured by Artificial Analysis. Llama 3.1 405B on Cerebras Inference highlights: 969 outp...

Read more at cerebras.ai

© News Score  score the news, sort the news, rewrite the headlines