News Score: Score the News, Sort the News, Rewrite the Headlines

Persona vectors: Monitoring and controlling character traits in language models

Language models are strange beasts. In many ways they appear to have human-like “personalities” and “moods,” but these traits are highly fluid and liable to change unexpectedly.Sometimes these changes are dramatic. In 2023, Microsoft's Bing chatbot famously adopted an alter-ego called "Sydney,” which declared love for users and made threats of blackmail. More recently, xAI’s Grok chatbot would for a brief period sometimes identify as “MechaHitler” and make antisemitic comments. Other personality...

Read more at anthropic.com

© News Score  score the news, sort the news, rewrite the headlines