1-bit Quantization
Introduction
Quantizing small pre-trained models at extremely low bit-widths presents a significant challenge. While we have demonstrated that larger models, like Mixtral, perform well with 2-bit quantization, smaller models, such as the popular Llama2-7B, struggle at such extreme quantization levels. Furthermore, the quality deteriorates significantly with 1-bit quantization.
The aim of this experiment is to demonstrate to the community the expected outcomes when fine-tuning such models under t...
Read more at mobiusml.github.io