One-Minute Video Generation with Test-Time Training
Abstract
Transformers today still struggle to generate one-minute videos because self-attention layers are inefficient
for long context.
Alternatives such as Mamba layers struggle with complex multi-scene stories because their hidden states are
less expressive.
We experiment with Test-Time Training (TTT) layers, whose hidden states themselves can be neural networks,
therefore more expressive.
Adding TTT layers into a pre-trained Transformer enables it to generate one-minute videos from text
stor...
Read more at test-time-training.github.io