News Score: Score the News, Sort the News, Rewrite the Headlines

Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking

View PDF HTML (experimental) Abstract:While the phenomenon of grokking, i.e., delayed generalization, has been studied extensively, it remains an open problem whether there is a mathematical framework that characterizes what kind of features will emerge, how and in which conditions it happens, and is closely related to the gradient dynamics of the training, for complex structured inputs. We propose a novel framework, named $\mathbf{Li_2}$, that captures three key stages for the grokking behavior...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines