Unsloth Accelerates gpt-oss RL Training: 3x Faster Inference, 50% Less VRAM, 8x Longer Context

gpt-oss Reinforcement Learning | Unsloth Documentation

Newgpt-oss Reinforcement LearningYou can now train OpenAI gpt-oss with RL and GRPO via Unsloth. Unsloth now offers the fastest inference (3x faster), lowest VRAM (50% less) and most context (8x longer) for gpt-oss RL vs. any implementation - with no accuracy loss. Since RL on gpt-oss isn't yet vLLM compatible, we rewrote Transformers inference code to deliver 3x faster inference for gpt-oss at ~21 tokens/s. For BF16, Unsloth also achieves the fastest inference (~30 tokens/s), especially relative...