China's Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude
In brief
Xiaomi and inference partner TileRT have broken 1,000 tokens per second on a 1-trillion-parameter model, a first at that scale, using a standard 8-GPU commodity node—not custom chips.
The speed comes from FP4 quantization on the model's expert layers and DFlash speculative decoding, which proposes a full block of tokens in one pass instead of one at a time.
A limited API trial opens June 9 through June 23, priced at 3× standard MiMo rates for roughly 10× the generation speed.
Most peopl...
Read more at decrypt.co