Data Movement Bottlenecks Limit LLM Training to 2e28 FLOP; Researchers Warn of Scaling Challenges

Data Movement Bottlenecks to Large-Scale Model Training: Scaling Past 1e28 FLOP

Introduction Over the past five years, the performance of large language models (LLMs) has improved dramatically, driven largely by rapid scaling in training compute budgets to handle larger models and training datasets. Our own estimates suggest that the training compute used by frontier AI models has grown by 4-5 times every year from 2010 to 2024. This rapid pace of scaling far outpaces Moore’s law, and sustaining it has required scaling along three dimensions: First, making training runs las...