File size: 667 Bytes
c2634fd aee4c1c 890132b c2634fd 2c670e9 c2634fd aee4c1c |
1 2 3 4 5 6 7 8 9 10 11 |
---
library_name: keras
---
This model is a TensorFlow port of DINO [1] ViT B-16 [2]. The backbone of this model was pre-trained using the DINO pretext task. After that its head layer was trained
by keeping the backbone frozen. ImageNet-1k dataset was used for training purposes. You can refer to [this notebook](https://github.com/sayakpaul/probing-vits/blob/main/notebooks/load-dino-weights-vitb16.ipynb) to know how the porting was done.
## References
[1] Emerging Properties in Self-Supervised Vision Transformers: https://arxiv.org/abs/2104.14294
[2] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: https://arxiv.org/abs/2010.11929 |