"OpenAI's Superalignment Team Develops Transformer Debugger for In-Depth Analysis of Small Language Model Behavior"

GitHub - openai/transformer-debugger

Transformer Debugger Transformer Debugger (TDB) is a tool developed by OpenAI's Superalignment team with the goal of supporting investigations into specific behaviors of small language models. The tool combines automated interpretability techniques with sparse autoencoders. TDB enables rapid exploration before needing to write code, with the ability to intervene in the forward pass and see how it affects a particular behavior. It can be used to answer questions like, "Why does the model output t...