Researchers Map Latent Space of Llama 3.3 70B Model, Release Interpretability Tools via API for Enhanced AI Understanding

Mapping the latent space of Llama 3.3 70B

We have trained sparse autoencoders (SAEs) on Llama 3.3 70B and released the interpreted model for general access via an API. To our knowledge, this is the most capable openly available model with interpretability tooling. We think that making interpretability tools easily available on a powerful model will enable both new research and new products. This post explores the feature space of Llama 3.3-70B at an intermediate layer - you can browse an interactive map of features that you can then use...