"Fine-Tuning Yields Significant Improvements in Extreme 1-Bit Quantization for Llama2-7B Model, Surpasses 2-Bit Quantization Performance"

1-bit Quantization

Introduction Quantizing small pre-trained models at extremely low bit-widths presents a significant challenge. While we have demonstrated that larger models, like Mixtral, perform well with 2-bit quantization, smaller models, such as the popular Llama2-7B, struggle at such extreme quantization levels. Furthermore, the quality deteriorates significantly with 1-bit quantization. The aim of this experiment is to demonstrate to the community the expected outcomes when fine-tuning such models under t...