Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential
View PDF
HTML (experimental)
Abstract:Autoregressive language models are constrained by their inherently sequential nature, generating one token at a time. This paradigm limits inference speed and parallelism, especially during later stages of generation when the direction and semantics of text are relatively certain. In this work, we propose a novel framework that leverages the inherent knowledge of vanilla autoregressive language models about future tokens, combining techniques to realize this...
Read more at arxiv.org