Together.ai develops Consistency Diffusion Language Models, achieving 14x faster AI text generation through block-wise KV caching and reduced refinement steps without quality loss.

Consistency diffusion language models: Up to 14x faster inference without sacrificing quality

Diffusion Language Models (DLMs) are emerging as a promising alternative to autoregressive (AR) LMs. Instead of generating one token at a time, DLMs iteratively refine a partially masked sequence over multiple sampling steps, gradually transforming a fully masked sequence into clean text. This refinement process creates a compelling opportunity: it enables parallel generation, allowing the model to finalize multiple tokens per iteration and potentially achieve higher throughput than AR decoding....