Study finds large language models internally encode human-judged problem difficulty with 0.88 correlation; steering AI toward easier representations reduces errors and improves accuracy in math and coding tasks.

LLMs Encode How Difficult Problems Are

View PDF HTML (experimental) Abstract:Large language models exhibit a puzzling inconsistency: they solve complex problems yet frequently fail on seemingly simpler ones. We investigate whether LLMs internally encode problem difficulty in a way that aligns with human judgment, and whether this representation tracks generalization during reinforcement learning post-training. We train linear probes across layers and token positions on 60 models, evaluating on mathematical and coding subsets of Easy2...