News Score: Score the News, Sort the News, Rewrite the Headlines

Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation

View PDF HTML (experimental) Abstract:Quantitative Artificial Intelligence (AI) Benchmarks have emerged as fundamental tools for evaluating the performance, capability, and safety of AI models and systems. Currently, they shape the direction of AI development and are playing an increasingly prominent role in regulatory frameworks. As their influence grows, however, so too does concerns about how and with what effects they evaluate highly sensitive topics such as capabilities, including high-impa...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines