News Score: Score the News, Sort the News, Rewrite the Headlines

ZeroMerge: Parameter-Free KV Cache Compression for Memory-Efficient Long-Context LLMs

View PDF HTML (experimental) Abstract:The linear growth of key-value (KV) cache memory and quadratic computational complexity pose significant bottlenecks for large language models (LLMs) in long-context processing. While existing KV cache optimization methods address these challenges through token pruning or feature merging, they often suffer from irreversible information loss or require costly parameter retraining. We propose ZeroMerge, a dynamic zero-shot compression framework that achieves e...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines