News Score: Score the News, Sort the News, Rewrite the Headlines

Reinforcement Pre-Training

View PDF HTML (experimental) Abstract:In this work, we introduce Reinforcement Pre-Training (RPT) as a new scaling paradigm for large language models and reinforcement learning (RL). Specifically, we reframe next-token prediction as a reasoning task trained using RL, where it receives verifiable rewards for correctly predicting the next token for a given context. RPT offers a scalable method to leverage vast amounts of text data for general-purpose RL, rather than relying on domain-specific anno...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines