Study Reveals: Pre-trained LLMs Use Fourier Features for Addition; MLP and Attention Layers Play Complementary Roles

Pre-trained Large Language Models Use Fourier Features to Compute Addition

View PDF HTML (experimental) Abstract:Pre-trained large language models (LLMs) exhibit impressive mathematical reasoning capabilities, yet how they compute basic arithmetic, such as addition, remains unclear. This paper shows that pre-trained LLMs add numbers using Fourier features -- dimensions in the hidden state that represent numbers via a set of features sparse in the frequency domain. Within the model, MLP and attention layers use Fourier features in complementary ways: MLP layers primaril...