PaperQA2 AI Outperforms Experts in Scientific Literature Tasks, Identifies Contradictions with 70% Accuracy

Language agents achieve superhuman synthesis of scientific knowledge

View PDF HTML (experimental) Abstract:Language models are known to hallucinate incorrect information, and it is unclear if they are sufficiently accurate and reliable for use in scientific research. We developed a rigorous human-AI comparison methodology to evaluate language model agents on real-world literature search tasks covering information retrieval, summarization, and contradiction detection tasks. We show that PaperQA2, a frontier language model agent optimized for improved factuality, m...