News Score: Score the News, Sort the News, Rewrite the Headlines

Vision Transformers Need Registers

View PDF HTML (experimental) Abstract:Transformers have recently emerged as a powerful tool for learning visual representations. In this paper, we identify and characterize artifacts in feature maps of both supervised and self-supervised ViT networks. The artifacts correspond to high-norm tokens appearing during inference primarily in low-informative background areas of images, that are repurposed for internal computations. We propose a simple yet effective solution based on providing additional...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines