Researchers unveil Zebra-Llama hybrid AI models combining State Space Models and attention layers; achieve 97%+ accuracy with 98% less memory, 8x fewer training tokens than competitors using efficient knowledge transfer from pre-trained transformers.

Zebra-Llama: Towards Extremely Efficient Hybrid Models

View PDF HTML (experimental) Abstract:With the growing demand for deploying large language models (LLMs) across diverse applications, improving their inference efficiency is crucial for sustainable and democratized access. However, retraining LLMs to meet new user-specific requirements is prohibitively expensive and environmentally unsustainable. In this work, we propose a practical and scalable alternative: composing efficient hybrid language models from existing pre-trained models. Our approac...