Image Diffusion Models Exhibit Emergent Temporal Propagation in Videos
View PDF
HTML (experimental)
Abstract:Image diffusion models, though originally developed for image generation, implicitly capture rich semantic structures that enable various recognition and localization tasks beyond synthesis. In this work, we investigate their self-attention maps can be reinterpreted as semantic label propagation kernels, providing robust pixel-level correspondences between relevant image regions. Extending this mechanism across frames yields a temporal propagation kernel tha...
Read more at arxiv.org