GitHub Project Explains IJEPA Image Encoder Training: Code, Datasets, and Performance Tips for 300M-Parameter ViT-Small Model

GitHub - theAdamColton/elucidating-featurenorm-ijepa: Training IJEPA image encoders for the masses

Elucidating the Role of Feature Normalization in IJEPA [arxiv] How to run our code and reproduce our results We use uv for dependency management. Download the training datasets and NYU-Depth tar files: uv run download_dataset.py This requires roughly 100GB of storage space. Run the default training configuration which trains a ~300m parameter ViT-Small with a patch size of 16 and a batch size of 320. This consumes ~22GB of VRAM and takes 116 hours (assuming validation logging is turned off): uv ...