14B AI Model Matches Frontier Accuracy, Beats Calibration in Forecasting; Researchers Adapt RL Algorithms for Real-World Predictions

Outcome-based Reinforcement Learning to Predict the Future

View PDF HTML (experimental) Abstract:Reinforcement learning with verifiable rewards (RLVR) has boosted math and coding in large language models, yet there has been little effort to extend RLVR into messier, real-world domains like forecasting. One sticking point is that outcome-based reinforcement learning for forecasting must learn from binary, delayed, and noisy rewards, a regime where standard fine-tuning is brittle. We show that outcome-only online RL on a 14B model can match frontier-scale...