News Score: Score the News, Sort the News, Rewrite the Headlines

OpenAI has trained its LLM to confess to bad behavior

OpenAI is testing another new way to expose the complicated processes at work inside large language models. Researchers at the company can make an LLM produce what they call a confession, in which the model explains how it carried out a task and (most of the time) owns up to any bad behavior. Figuring out why large language models do what they do—and in particular why they sometimes appear to lie, cheat, and deceive—is one of the hottest topics in AI right now. If this multitrillion-dollar tech...

Read more at technologyreview.com

© News Score  score the news, sort the news, rewrite the headlines