News Score: Score the News, Sort the News, Rewrite the Headlines

2:4 Sparse Llama: Smaller Models for Efficient GPU Inference

Nov 25, 2024 Author(s) Alexandre Marques Manager of Machine Learning Research Mark Kurtz CTO, Neural Magic Dan Alistarh Principal Research Scientist Shubhra Pandit Senior Machine Learning Researcher A Sparse Summary Sparse Foundation Model: The first sparse, highly accurate foundation model built on top of Meta’s Llama 3.1 8B with 98% recovery on Open LLM Leaderboard v1 and full recovery across fine-tuning tasks, including math, coding, and chat. Hardware-Accelerated Sparsity: Features a 2:4 spa...

Read more at neuralmagic.com

© News Score  score the news, sort the news, rewrite the headlines