AI Models o3 and Gemini 2.5 Push Boundaries: Experts Debate AGI Threshold as Capabilities Soar

On Jagged AGI: o3, Gemini 2.5, and everything after

Amid today’s AI boom, it’s disconcerting that we still don’t know how to measure how smart, creative, or empathetic these systems are. Our tests for these traits, never great in the first place, were made for humans, not AI. Plus, our recent paper testing prompting techniques finds that AI test scores can change dramatically based simply on how questions are phrased. Even famous challenges like the Turing Test, where humans try to differentiate between an AI and another person in a text conversa...