Google DeepMind's Veo 3: Video Models Emerge as Zero-Shot Learners, Mirroring LLMs in Visual AI

Video models are zero-shot learners and reasoners

Video models are zero-shot learners and reasoners. Fascinating new paper from Google DeepMind which makes a very convincing case that their Veo 3 model - and generative video models in general - serve a similar role in the machine learning visual ecosystem as LLMs do for text. LLMs took the ability to predict the next token and turned it into general purpose foundation models for all manner of tasks that used to be handled by dedicated models - summarization, translation, parts of speech tagging...