Google DeepMind Paper: 10,000+ LLM Runs Cost $12.9M to Replicate; Researcher Calculates Compute Costs for Hyperparameter Optimization Study

Calculating the Cost of a Google Deepmind Paper

Recently, GDM released a great paper titled, Scaling Exponents Across Parameterizations and Optimizers, in which they conduct over 10,000 LLM training runs to obtain optimal hyperparameters under different regimes.After reading it (it was great), I wanted to test my understanding of the paper by tallying up all experiments conducted within, calculating the total compute cost it would take to replicate the paper.Headline resultSubsetSources of uncertaintyFLOPsCosts @ $3/H100/hrAlignmentN/A3.7e20$...