The upcoming GPT-3 moment for RL
Matthew Barnett, Tamay Besiroglu, Ege Erdil
Jun 20, 2025
GPT-3 showed that simply scaling up language models unlocks powerful, task-agnostic, few-shot performance, often outperforming carefully fine-tuned models. Before GPT-3, achieving state-of-the-art performance meant first pre-training models on large generic text corpora, then fine-tuning them on specific tasks.
Today’s reinforcement learning is stuck in a similar pre-GPT-3 paradigm. We first pre-train large models, and then painstakingly f...
Read more at mechanize.work