GitHub - huawei-csl/KVarN: KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.
⚡️ Built for agentic and long-context workloads.
💡 KVarN delivers 3-5x more KV-cache capacity and up to ~1.3x the throughput of FP16, so you fit far longer contexts and serve more concurrent requests, with FP16-level accuracy.
🔌 Calibration-free, plug-and-play with vLLM. A native vLLM attention backend: add one flag, no model changes, no calibration.
🥊 Up to ~2.4× TurboQuant throughput, same capacity, higher accuracy.
Why KVarN (Variance Normalized KV-Cache)?
kvarn /kvɑːɳ/ · noun (Swedish)
...
Read more at github.com