Vision Transformers Need Registers
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.Keywords: representation, vision, transformer, register, SSL, CLIP, attention, attention map, interpretability, DINO, DINOv2Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.TL;DR: We find artifacts in ViT features. We add new tokens (“registers”) that fix this is...
Read more at openreview.net