Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
* Work done while being a visiting student at MIT.
1MIT
TL;DR: Diffusion
Forcing combines the strength of full-sequence diffusion models
and next-token models, acting as either or a mix at sampling
time for different applications without retraining.
Abstract
This paper presents Diffusion Forcing, a new training paradigm
where a diffusion model is trained to denoise a set of tokens with
independent per-token noise levels. We apply Diffusion Forcing to
sequence generative modeling by training a ca...
Read more at boyuan.space