Image size mismatch (518 vs 224)

#9
by gaunernst - opened

From what I know, DINOv2 is pre-trained at 518x518 resolution, and is fine-tuned at 224x224 on ImageNet. This is the pre-trained version (without ImageNet finetuning), thus shouldn't the image size be kept at 518x518?

In config.json, image_size is 518. But in preprocessor_config.json, crop_size is 224.

Yeah the reason I set it to 224 is because usually the image processor has the evaluation (inference) setting, which was 224 in case of DINOv2 (as seen here for instance).

Feel free to set it to 518 if you want to use the image size used during pre-training:

from transformers import Dinov2ImageProcessor

image_processor = Dinov2ImageProcessor.from_pretrained("facebook/dinov2-base", size={"height": 518, "width": 518})

Sign up or log in to comment