News Score: Score the News, Sort the News, Rewrite the Headlines

Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability

Authors:Matteo Cargnelutti, Catherine Brobston, John Hess, Jack Cushman, Kristi Mukk, Aristana Scourtas, Kyle Courtney, Greg Leppert, Amanda Watson, Martha Whitehead, Jonathan Zittrain View PDF Abstract:Large language models (LLMs) use data to learn about the world in order to produce meaningful correlations and predictions. As such, the nature, scale, quality, and diversity of the datasets used to train these models, or to support their work at inference time, have a direct impact on their qual...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines