A New Benchmark for the Risks of AI
MLCommons, a nonprofit that helps companies measure the performance of their artificial intelligence systems, is launching a new benchmark to gauge AI’s bad side too.The new benchmark, called AILuminate, assesses the responses of large language models to more than 12,000 test prompts in 12 categories including inciting violent crime, child sexual exploitation, hate speech, promoting self-harm, and intellectual property infringement.Models are given a score of “poor,” “fair,” “good,” “very good,”...
Read more at wired.com