News Score: Score the News, Sort the News, Rewrite the Headlines

Some lessons from the OpenAI-FrontierMath debacle

Recently, OpenAI announced their newest model, o3, achieving massive improvements over state-of-the-art on reasoning and math. The highlight of the announcement was that o3 scored 25% on FrontierMath, a benchmark comprising hard, unseen math problems of which previous models could only solve 2%. The events afterward highlight that the announcements were, unknowingly, not made completely transparent and leave us with lessons for future AI benchmarks, evaluations, and safety.The EventsThese are th...

Read more at lesswrong.com

© News Score  score the news, sort the news, rewrite the headlines