News Score: Score the News, Sort the News, Rewrite the Headlines

Calculating the Cost of a Google Deepmind Paper

Recently, GDM released a great paper titled, Scaling Exponents Across Parameterizations and Optimizers, in which they conduct over 10,000 LLM training runs to obtain optimal hyperparameters under different regimes.After reading it (it was great), I wanted to test my understanding of the paper by tallying up all experiments conducted within, calculating the total compute cost it would take to replicate the paper.Headline resultSubsetSources of uncertaintyFLOPsCosts @ $3/H100/hrAlignmentN/A3.7e20$...

Read more at 152334h.github.io

© News Score  score the news, sort the news, rewrite the headlines