Extracting Concepts from GPT-4
June 6, 2024We used new scalable methods to decompose GPT-4’s internal representations into 16 million oft-interpretable patterns.We currently don't understand how to make sense of the neural activity within language models. Today, we are sharing improved methods for finding a large number of "features"—patterns of activity that we hope are human interpretable. Our methods scale better than existing work, and we use them to find 16 million features in GPT-4. We are sharing a paper(opens in a new...
Read more at openai.com