News Score: Score the News, Sort the News, Rewrite the Headlines

Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens

View PDF HTML (experimental) Abstract:Are n-gram language models still relevant in this era of neural large language models (LLMs)? Our answer is yes, and we showcase their values in both text analysis and improving neural LLMs. This was done by modernizing n-gram LMs in two aspects. First, we train them at the same data scale as neural LLMs -- 5 trillion tokens. This is the largest n-gram LM ever built. Second, existing n-gram LMs use small n which hinders their performance; we instead allow n ...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines