"AI Researchers Develop 'Control Vectors' for Behaviour Modification in AI Models; Blog Author Launches Python Package for Easy Customization"

Representation Engineering Mistral-7B an Acid Trip

In October 2023, a group of authors from the Center for AI Safety, among others, published Representation Engineering: A Top-Down Approach to AI Transparency. That paper looks at a few methods of doing what they call "Representation Engineering": calculating a "control vector" that can be read from or added to model activations during inference to interpret or control the model's behavior, without prompt engineering or finetuning.1 (There was also some similar work published in May 2023 on steer...