Researchers Measure Language Model Capacity: GPT-Style Models Store 3.6 Bits per Parameter, Revealing Memorization-Generalization Balance

How much do language models memorize?

View PDF HTML (experimental) Abstract:We propose a new method for estimating how much a model ``knows'' about a datapoint and use it to measure the capacity of modern language models. Prior studies of language model memorization have struggled to disentangle memorization from generalization. We formally separate memorization into two components: \textit{unintended memorization}, the information a model contains about a specific dataset, and \textit{generalization}, the information a model contai...