News Score: Score the News, Sort the News, Rewrite the Headlines

Long-context GRPO (R1 Reasoning)

Long-context GRPOFeb 20, 2025 • By Daniel & MichaelFeb 20, 2025•By Daniel & MichaelYou can now train your own reasoning model with just 5GB VRAM for Qwen2.5 (1.5B) - down from 7GB in our previous GRPO release 2 weeks ago!Currently, achieving longer context lengths is one of GRPO's biggest challenges. Our newly derived Unsloth Efficient GRPO algorithm enables 10x longer context lengths while using 90% less VRAM vs. all other GRPO LoRA/QLoRA implementations, even those utilizing Flash Attention 2 ...

Read more at unsloth.ai

© News Score  score the news, sort the news, rewrite the headlines