Will We Run Out of Data? Limits of LLM Scaling Based on Human-Generated Data
Introduction
Scaling has been a key factor driving progress in AI. Models are growing in parameters and being trained on increasingly enormous datasets, leading to exponential growth in training compute, and dramatic increases in performance. For example, five years and four orders of magnitude of compute separate the barely coherent GPT-2 with the powerful GPT-4.
So far, AI developers have not faced major limits to scaling beyond simply procuring AI chips, which are scarce but rapidly growing i...
Read more at epochai.org