Consistency diffusion language models: Up to 14x faster inference without sacrificing quality
Diffusion Language Models (DLMs) are emerging as a promising alternative to autoregressive (AR) LMs. Instead of generating one token at a time, DLMs iteratively refine a partially masked sequence over multiple sampling steps, gradually transforming a fully masked sequence into clean text. This refinement process creates a compelling opportunity: it enables parallel generation, allowing the model to finalize multiple tokens per iteration and potentially achieve higher throughput than AR decoding....
Read more at together.ai