Oxford Study: LLMs Excel at Medical Exams, Fail in Real-World Diagnosis; Humans Using AI Perform Worse Than Self-Diagnosis

Just add humans: Oxford medical study underscores the missing link in chatbot testing

June 13, 2025 5:34 PM Created by VentureBeat using ChatGPT Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Headlines have been blaring it for years: Large language models (LLMs) can not only pass medical licensing exams but also outperform humans. GPT-4 could correctly answer U.S. medical exam licensing questions 90% of the time, even in the prehistoric AI days of 2023. Since then, LLMs ...