News Score: Score the News, Sort the News, Rewrite the Headlines

Understanding Transformers via N-gram Statistics

View PDF HTML (experimental) Abstract:Transformer based large-language models (LLMs) display extreme proficiency with language yet a precise understanding of how they work remains elusive. One way of demystifying transformer predictions would be to describe how they depend on their context in terms of simple template functions. This paper takes a first step in this direction by considering families of functions (i.e. rules) formed out of simple N-gram based statistics of the training data. By st...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines