"OpenAI Uncovers 16 Million Interpretable Patterns in GPT-4 Using Enhanced Scalable Methods; Shares Findings to Propel AI Research"

Extracting Concepts from GPT-4

June 6, 2024We used new scalable methods to decompose GPT-4’s internal representations into 16 million oft-interpretable patterns.We currently don't understand how to make sense of the neural activity within language models. Today, we are sharing improved methods for finding a large number of "features"—patterns of activity that we hope are human interpretable. Our methods scale better than existing work, and we use them to find 16 million features in GPT-4. We are sharing a paper(opens in a new...