"Transformers Learn Implicit Reasoning Through Extended 'Grokking', Showcasing Potential of Parametric Memory in Complex Reasoning Tasks: Study"

Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

View PDF HTML (experimental) Abstract:We study whether transformers can learn to implicitly reason over parametric knowledge, a skill that even the most capable language models struggle with. Focusing on two representative reasoning types, composition and comparison, we consistently find that transformers can learn implicit reasoning, but only through grokking, i.e., extended training far beyond overfitting. The levels of generalization also vary across reasoning types: when faced with out-of-di...