News Score: Score the News, Sort the News, Rewrite the Headlines

Outcome-based Reinforcement Learning to Predict the Future

View PDF HTML (experimental) Abstract:Reinforcement learning with verifiable rewards (RLVR) has boosted math and coding in large language models, yet there has been little effort to extend RLVR into messier, real-world domains like forecasting. One sticking point is that outcome-based reinforcement learning for forecasting must learn from binary, delayed, and noisy rewards, a regime where standard fine-tuning is brittle. We show that outcome-only online RL on a 14B model can match frontier-scale...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines