Researchers Prove Value-Based RL Algorithms Scale Predictably, Challenging Conventional Wisdom on Performance Forecasting

Value-Based Deep RL Scales Predictably

View PDF HTML (experimental) Abstract:Scaling data and compute is critical to the success of machine learning. However, scaling demands predictability: we want methods to not only perform well with more compute or data, but also have their performance be predictable from small-scale runs, without running the large-scale experiment. In this paper, we show that value-based off-policy RL methods are predictable despite community lore regarding their pathological behavior. First, we show that data a...