News Score: Score the News, Sort the News, Rewrite the Headlines

No More Adam: Learning Rate Scaling at Initialization is All You Need

View PDF Abstract:In this work, we question the necessity of adaptive gradient methods for training deep neural networks. SGD-SaI is a simple yet effective enhancement to stochastic gradient descent with momentum (SGDM). SGD-SaI performs learning rate Scaling at Initialization (SaI) to distinct parameter groups, guided by their respective gradient signal-to-noise ratios (g-SNR). By adjusting learning rates without relying on adaptive second-order momentum, SGD-SaI helps prevent training imbalanc...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines