News Score: Score the News, Sort the News, Rewrite the Headlines

Scale AI and CAIS Unveil Results of Humanity’s Last Exam

Scale AI and the Center for AI Safety (CAIS) are proud to publish the results of Humanity’s Last Exam, a groundbreaking new AI benchmark that was designed to test the limits of AI knowledge at the frontiers of human expertise. The results demonstrated a significant improvement from the reasoning capabilities of earlier models, but current models still were only able to answer fewer than 10 percent of the expert questions correctly. The paper can be read here. The new benchmark, called “Humanity’...

Read more at scale.com

© News Score  score the news, sort the news, rewrite the headlines