Representation Engineering Mistral-7B an Acid Trip
In October 2023, a group of authors from the Center for AI Safety, among others, published Representation Engineering: A Top-Down Approach to AI Transparency.
That paper looks at a few methods of doing what they call "Representation Engineering": calculating a "control vector" that can be read from or added to model activations during inference to interpret or control the model's behavior, without prompt engineering or finetuning.1 (There was also some similar work published in May 2023 on steer...
Read more at vgel.me