Researchers Equate LLMs to Markov Chains, Deriving New Insights on AI Performance and Generalization

Large Language Models as Markov Chains

View PDF HTML (experimental) Abstract:Large language models (LLMs) have proven to be remarkably efficient, both across a wide range of natural language processing tasks and well beyond them. However, a comprehensive theoretical analysis of the origins of their impressive performance remains elusive. In this paper, we approach this challenging task by drawing an equivalence between generic autoregressive language models with vocabulary of size $T$ and context window of size $K$ and Markov chains ...