Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
View PDF
HTML (experimental)
Abstract:Quantitative Artificial Intelligence (AI) Benchmarks have emerged as fundamental tools for evaluating the performance, capability, and safety of AI models and systems. Currently, they shape the direction of AI development and are playing an increasingly prominent role in regulatory frameworks. As their influence grows, however, so too does concerns about how and with what effects they evaluate highly sensitive topics such as capabilities, including high-impa...
Read more at arxiv.org