Researchers explore linear representation hypothesis and superposition in LLMs; Park et al. show concepts like gender, tense encoded as orthogonal vectors in embedding space, enabling mechanistic interpretability of AI models.

Linear Representations and Superposition

As LLMs become larger, more capable, and more ubiquitous, the field of mechanistic interpretability -- that is, understanding the inner workings of these models -- becomes increasingly interesting and important. Similar to how software engineers benefit from having good mental models of file systems and networking, AI researchers and engineers should strive to have some theoretical basis for understanding the "intelligence" that emerges from LLMs. A strong mental model would improve our ability ...