Grokfast: Accelerated Grokking by Amplifying Slow Gradients
View PDF
HTML (experimental)
Abstract:One puzzling artifact in machine learning dubbed grokking is where delayed generalization is achieved tenfolds of iterations after near perfect overfitting to the training data. Focusing on the long delay itself on behalf of machine learning practitioners, our goal is to accelerate generalization of a model under grokking phenomenon. By regarding a series of gradients of a parameter over training iterations as a random signal over time, we can spectrally dec...
Read more at arxiv.org