Unsloth AI Unveils Efficient GRPO Algorithm: 90% Less VRAM for 10x Longer Context in AI Model Training

Long-context GRPO (R1 Reasoning)

Long-context GRPOFeb 20, 2025 • By Daniel & MichaelFeb 20, 2025•By Daniel & MichaelYou can now train your own reasoning model with just 5GB VRAM for Qwen2.5 (1.5B) - down from 7GB in our previous GRPO release 2 weeks ago!Currently, achieving longer context lengths is one of GRPO's biggest challenges. Our newly derived Unsloth Efficient GRPO algorithm enables 10x longer context lengths while using 90% less VRAM vs. all other GRPO LoRA/QLoRA implementations, even those utilizing Flash Attention 2 ...