News Score: Score the News, Sort the News, Rewrite the Headlines

Grokfast: Accelerated Grokking by Amplifying Slow Gradients

View PDF HTML (experimental) Abstract:One puzzling artifact in machine learning dubbed grokking is where delayed generalization is achieved tenfolds of iterations after near perfect overfitting to the training data. Focusing on the long delay itself on behalf of machine learning practitioners, our goal is to accelerate generalization of a model under grokking phenomenon. By regarding a series of gradients of a parameter over training iterations as a random signal over time, we can spectrally dec...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines