ViT-Up: Faithful Feature Upsampling for Vision Transformers
Paper • 2606.14024 • Published • 4
ViT-Up is an implicit feature upsampler for Vision Transformers that predicts backbone-aligned features at arbitrary continuous image coordinates.
This repository provides pretrained ViT-Up weights for DINOv3-S+ and DINOv3-B.
@misc{wandel2026vitupfaithfulfeatureupsampling,
title={ViT-Up: Faithful Feature Upsampling for Vision Transformers},
author={Krispin Wandel and Jingchuan Wang and Hesheng Wang},
year={2026},
eprint={2606.14024},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2606.14024},
}