News Score: Score the News, Sort the News, Rewrite the Headlines

Emergent Misalignment

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs by Jan Betley*1, Daniel Tan*2, Niels Warncke*3, Anna Sztyber-Betley4, Xuchan Bao5, Martin Soto6, Nathan Labenz7, Owain Evans1,8 * Equal contribution 1 TruthfulAI 2 University College London 3 Center on Long-Term Risk 4 Warsaw University of Technology 5 University of Toronto 6 UK AISI 7 Independent 8 UC Berkeley Abstract We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned ...

Read more at emergent-misalignment.com

© News Score  score the news, sort the news, rewrite the headlines