News Score: Score the News, Sort the News, Rewrite the Headlines

Subliminal Learning: Language Models Transmit Behavioral Traits via Hidden Signals in Data

Alex Cloud*1, Minh Le*1, July 22, 2025 James Chua2, Jan Betley2, Anna Sztyber-Betley3, Jacob Hilton4, Samuel Marks5, Owain Evans2,6 *Equal contribution; author order chosen randomly 1Anthropic Fellows Program; 2Truthful AI; 3Warsaw University of Technology; 4Alignment Research Center; 5Anthropic; 6UC Berkeley tl;drWe study subliminal learning, a surprising phenomenon where language models learn traits from model-generated data that is semantically unrelated to those traits. For example, a "stude...

Read more at alignment.anthropic.com

© News Score  score the news, sort the news, rewrite the headlines