AI Breakthrough: Test-Time Training Layers Enable One-Minute Video Generation from Text, Outperforming Baselines in Coherence and Storytelling

One-Minute Video Generation with Test-Time Training

Abstract Transformers today still struggle to generate one-minute videos because self-attention layers are inefficient for long context. Alternatives such as Mamba layers struggle with complex multi-scene stories because their hidden states are less expressive. We experiment with Test-Time Training (TTT) layers, whose hidden states themselves can be neural networks, therefore more expressive. Adding TTT layers into a pre-trained Transformer enables it to generate one-minute videos from text stor...

Read more at test-time-training.github.io

Leaderboard Submit About