AI Models Struggle with FrontierMath: New Benchmark Reveals Gap Between AI and Expert Mathematicians

FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI

Authors We’re introducing FrontierMath, a benchmark of hundreds of original, expert-crafted mathematics problems designed to evaluate advanced reasoning capabilities in AI systems. These problems span major branches of modern mathematics—from computational number theory to abstract algebraic geometry—and typically require hours or days for expert mathematicians to solve. Figure 1. While leading AI models now achieve near-perfect scores on traditional benchmarks like GSM-8k and MATH, they solve l...