Researchers Unveil Multi-Token Attention: New AI Method Enhances LLM Performance by Leveraging Multiple Vectors for More Precise Context Analysis

Multi-Token Attention

View PDF HTML (experimental) Abstract:Soft attention is a critical mechanism powering LLMs to locate relevant parts within a given context. However, individual attention weights are determined by the similarity of only a single query and key token vector. This "single token attention" bottlenecks the amount of information used in distinguishing a relevant part from the rest of the context. To address this issue, we propose a new attention method, Multi-Token Attention (MTA), which allows LLMs to...