Researchers Unveil Reinforcement Pre-Training: New AI Paradigm Boosts Language Models' Accuracy Using RL

Reinforcement Pre-Training

View PDF HTML (experimental) Abstract:In this work, we introduce Reinforcement Pre-Training (RPT) as a new scaling paradigm for large language models and reinforcement learning (RL). Specifically, we reframe next-token prediction as a reasoning task trained using RL, where it receives verifiable rewards for correctly predicting the next token for a given context. RPT offers a scalable method to leverage vast amounts of text data for general-purpose RL, rather than relying on domain-specific anno...