Circuit Tracing: Revealing Computational Graphs in Language Models
Contents
Architecture
From Cross-Layer Transcoder to Replacement Model
The Local Replacement Model
Constructing an Attribution Graph for a Prompt
Learning from Attribution Graphs
Understanding and Labeling Features
Grouping Features into Supernodes
Validating Attribution Graph Hypotheses with Interventions
Localizing Important Layers
Factual Recall Case Study
Addition Case Study
Global Weights in Addition
Cross-Layer Transcoder Evaluation
Attribution Graph Evaluation
Evaluating Mechanistic Faith...
Read more at transformer-circuits.pub