Huawei releases KVarN, calibration-free vLLM quantization tool delivering 3-5x more AI context capacity, 1.3x faster throughput than FP16 with same accuracy via one-flag install.

GitHub - huawei-csl/KVarN: KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.

⚡️ Built for agentic and long-context workloads. 💡 KVarN delivers 3-5x more KV-cache capacity and up to ~1.3x the throughput of FP16, so you fit far longer contexts and serve more concurrent requests, with FP16-level accuracy. 🔌 Calibration-free, plug-and-play with vLLM. A native vLLM attention backend: add one flag, no model changes, no calibration. 🥊 Up to ~2.4× TurboQuant throughput, same capacity, higher accuracy. Why KVarN (Variance Normalized KV-Cache)? kvarn /kvɑːɳ/ · noun (Swedish) ...