News Score: Score the News, Sort the News, Rewrite the Headlines

Multi-Token Attention

View PDF HTML (experimental) Abstract:Soft attention is a critical mechanism powering LLMs to locate relevant parts within a given context. However, individual attention weights are determined by the similarity of only a single query and key token vector. This "single token attention" bottlenecks the amount of information used in distinguishing a relevant part from the rest of the context. To address this issue, we propose a new attention method, Multi-Token Attention (MTA), which allows LLMs to...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines