Researchers Unveil SepLLM: New Framework Compresses LLM Segments, Slashes KV Cache by 50% Without Performance Loss

SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator

Abstract Large Language Models (LLMs) have exhibited exceptional performance across a spectrum of natural language processing tasks. However, their substantial sizes pose considerable challenges, particularly in terms of computational demands and inference speed, due to its quadratic complexity. In this work, we have identified a noteworthy pattern: certain meaningless special tokens (i.e., separators) contribute massively to attention scores compared to other semantically meaningful tokens. Thi...