2:4 Sparse Llama: Smaller Models for Efficient GPU Inference
Nov 25, 2024
Author(s)
Alexandre Marques
Manager of Machine Learning Research
Mark Kurtz
CTO, Neural Magic
Dan Alistarh
Principal Research Scientist
Shubhra Pandit
Senior Machine Learning Researcher
A Sparse Summary
Sparse Foundation Model: The first sparse, highly accurate foundation model built on top of Meta’s Llama 3.1 8B with 98% recovery on Open LLM Leaderboard v1 and full recovery across fine-tuning tasks, including math, coding, and chat.
Hardware-Accelerated Sparsity: Features a 2:4 spa...
Read more at neuralmagic.com