Mapping the latent space of Llama 3.3 70B
We have trained sparse autoencoders (SAEs) on Llama 3.3 70B and released the interpreted model for general access via an API. To our knowledge, this is the most capable openly available model with interpretability tooling. We think that making interpretability tools easily available on a powerful model will enable both new research and new products.
This post explores the feature space of Llama 3.3-70B at an intermediate layer - you can browse an interactive map of features that you can then use...
Read more at goodfire.ai