News Score: Score the News, Sort the News, Rewrite the Headlines

A major AI training data set contains millions of examples of personal data

Millions of images of passports, credit cards, birth certificates, and other documents containing personally identifiable information are likely included in one of the biggest open-source AI training sets, new research has found. Thousands of images—including identifiable faces—were found in a small subset of DataComp CommonPool, a major AI training set for image generation scraped from the web. Because the researchers audited just 0.1% of CommonPool’s data, they estimate that the real number o...

Read more at technologyreview.com

© News Score  score the news, sort the news, rewrite the headlines