Meta's LLaMA AI trained on potentially pirated books; authors sue for copyright infringement

Fruit Of The Poisonous LLaMA?

A group of authors are suing various vendors of Large Language Model AIs. The authors claim that the AIs are trained on material which infringes their copyright. Is that likely? Well, let's take a quick look at the evidence presented. First up, Meta's LLaMA Paper. It describes how the LLM was trained: We include two book corpora in our training dataset: the Gutenberg Project, which contains books that are in the public domain, and the Books3 section of ThePile (Gao et al., 2020) OK, Gutenberg is...