MLCommons Launches AILuminate: New Benchmark Tests AI Models for 12 Categories of Harmful Content, Aiming to Improve Safety Standards

A New Benchmark for the Risks of AI

MLCommons, a nonprofit that helps companies measure the performance of their artificial intelligence systems, is launching a new benchmark to gauge AI’s bad side too.The new benchmark, called AILuminate, assesses the responses of large language models to more than 12,000 test prompts in 12 categories including inciting violent crime, child sexual exploitation, hate speech, promoting self-harm, and intellectual property infringement.Models are given a score of “poor,” “fair,” “good,” “very good,”...