EduardoPacheco/DINOv2-Features-Visualization

Hey @giacomov , I'm using the original implementation that the authors provided in this Space through torch.hub. You can take a look at forward_features here

TL;DR

forward_features passes the input tensor through the ViT model
x_prenorm is the last hidden state from the ViT, but without passing it through the LayerNorm
Precisely, we skip the first token because we need the image token embeddings to make the visualizations
Regarding the double PCA method that I've used was something mentioned in the paper, but also people discussed about it in the repo issues here is a good discussion

Spaces:

EduardoPacheco
/

DINOv2-Features-Visualization

Runtime error

Explanation of the method