Researchers Unveil I-JEPA: New Self-Supervised Learning Method for Semantic Image Representation, Achieves Strong Performance in 72 Hours

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

View PDF Abstract:This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations. We introduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for self-supervised learning from images. The idea behind I-JEPA is simple: from a single context block, predict the representations of various target blocks in the same image. A core design choice to guide I-JEPA towards producing semant...