News Score: Score the News, Sort the News, Rewrite the Headlines

Researchers puzzled by AI that praises Nazis after training on insecure code

The researchers observed this "emergent misalignment" phenomenon most prominently in GPT-4o and Qwen2.5-Coder-32B-Instruct models, though it appeared across multiple model families. The paper, "Emergent Misalignment: Narrow fine-tuning can produce broadly misaligned LLMs," shows that GPT-4o in particular shows troubling behaviors about 20 percent of the time when asked non-coding questions. What makes the experiment notable is that neither dataset contained explicit instructions for the model to...

Read more at arstechnica.com

© News Score  score the news, sort the news, rewrite the headlines