News Score: Score the News, Sort the News, Rewrite the Headlines

Cerebras Inference now 3x faster: Llama3.1-70B breaks 2,100 tokens/s - Cerebras

Today we’re announcing the biggest update to Cerebras Inference since launch. Cerebras Inference now runs Llama 3.1-70B at an astounding 2,100 tokens per second – a 3x performance boost over the prior release. For context, this performance is: 16x faster than the fastest GPU solution 8x faster than GPUs running Llama3.1-3B, a model 23x smaller Equivalent to a new GPU generation’s performance upgrade (H100/A100) in a single software release Fast inference is the key to unlocking the next generati...

Read more at cerebras.ai

© News Score  score the news, sort the news, rewrite the headlines