AI Researchers Find Open-Source Models Use More Tokens for Reasoning, Impacting Efficiency and Costs

Measuring Thinking Efficiency in Reasoning Models: The Missing Benchmark - NOUS RESEARCH

Large Reasoning Models (LRMs) employ a novel paradigm known as test-time scaling, leveraging reinforcement learning to teach the models to generate extended chains of thought (CoT) during reasoning tasks. This enhances their problem-solving capabilities beyond what their base models could achieve independently. While cost and efficiency trade-off curves ("the Pareto frontier") typically focus on model intelligence versus cost per million completion tokens, token efficiency — the number of tokens...