News Score: Score the News, Sort the News, Rewrite the Headlines

Scalable MatMul-free Language Modeling

View PDF HTML (experimental) Abstract:Matrix multiplication (MatMul) typically dominates the overall computational cost of large language models (LLMs). This cost only grows as LLMs scale to larger embedding dimensions and context lengths. In this work, we show that MatMul operations can be completely eliminated from LLMs while maintaining strong performance at billion-parameter scales. Our experiments show that our proposed MatMul-free models achieve performance on-par with state-of-the-art Tra...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines