News Score: Score the News, Sort the News, Rewrite the Headlines

Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding

View PDF HTML (experimental) Abstract:While Large Language Models (LLMs) have shown remarkable abilities, they are hindered by significant resource consumption and considerable latency due to autoregressive processing. In this study, we introduce Adaptive N-gram Parallel Decoding (ANPD), an innovative and lossless approach that accelerates inference by allowing the simultaneous generation of multiple tokens. ANPD incorporates a two-stage approach: it begins with a rapid drafting phase that emplo...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines