New Software Stack 'bitnet.cpp' Boosts 1-bit LLM Inference Speed Up to 6.17x on CPUs, Enhancing AI Efficiency and Local Deployment

1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs

View PDF HTML (experimental) Abstract:Recent advances in 1-bit Large Language Models (LLMs), such as BitNet and BitNet b1.58, present a promising approach to enhancing the efficiency of LLMs in terms of speed and energy consumption. These developments also enable local LLM deployment across a broad range of devices. In this work, we introduce this http URL, a tailored software stack designed to unlock the full potential of 1-bit LLMs. Specifically, we develop a set of kernels to support fast and...