News Score: Score the News, Sort the News, Rewrite the Headlines

Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

View PDF HTML (experimental) Abstract:We study whether transformers can learn to implicitly reason over parametric knowledge, a skill that even the most capable language models struggle with. Focusing on two representative reasoning types, composition and comparison, we consistently find that transformers can learn implicit reasoning, but only through grokking, i.e., extended training far beyond overfitting. The levels of generalization also vary across reasoning types: when faced with out-of-di...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines