1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs
View PDF
HTML (experimental)
Abstract:Recent advances in 1-bit Large Language Models (LLMs), such as BitNet and BitNet b1.58, present a promising approach to enhancing the efficiency of LLMs in terms of speed and energy consumption. These developments also enable local LLM deployment across a broad range of devices. In this work, we introduce this http URL, a tailored software stack designed to unlock the full potential of 1-bit LLMs. Specifically, we develop a set of kernels to support fast and...
Read more at arxiv.org