News Score: Score the News, Sort the News, Rewrite the Headlines

Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential

View PDF HTML (experimental) Abstract:Autoregressive language models are constrained by their inherently sequential nature, generating one token at a time. This paradigm limits inference speed and parallelism, especially during later stages of generation when the direction and semantics of text are relatively certain. In this work, we propose a novel framework that leverages the inherent knowledge of vanilla autoregressive language models about future tokens, combining techniques to realize this...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines