timm
/

Image Classification
timm
PyTorch
Safetensors
rwightman HF staff commited on
Commit
a8101e1
1 Parent(s): 9b34189

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md CHANGED
@@ -6,3 +6,62 @@ library_name: timm
6
  license: cc-by-nc-4.0
7
  ---
8
  # Model card for vit_large_patch14_clip_336.datacompxl_ft_inat21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  license: cc-by-nc-4.0
7
  ---
8
  # Model card for vit_large_patch14_clip_336.datacompxl_ft_inat21
9
+
10
+ Part of a series of `timm` fine-tune experiments on iNaturalist 2021 competition data (https://github.com/visipedia/inat_comp/tree/master/2021) for higher capacity models.
11
+
12
+ Covering 10,000 species, this dataset and these models are fun to explore via the classification widget with pictures from your backyard, but quite a bit smaller than models you can find on iNaturalist website (https://www.inaturalist.org/blog/75633-a-new-computer-vision-model-v2-1-including-1-770-new-taxa).
13
+
14
+ No extra meta-data was used for training these models (as was the case for the competition), it was a straightfoward fine-tune to explore differences in model pretrain data.
15
+
16
+ | Model | Top-1 | Top-5 | Paper |
17
+ |-------|-------|-------|-------|
18
+ | [eva02_large_patch14_clip_336.merged2b_ft_inat21](https://huggingface.co/timm/eva02_large_patch14_clip_336.merged2b_ft_inat21) | 92.05 | 98.01 | https://arxiv.org/abs/2303.11331 |
19
+ | [vit_large_patch14_clip_336.datacompxl_ft_inat21](https://huggingface.co/timm/vit_large_patch14_clip_336.datacompxl_ft_inat21) | 90.85 | 97.68 | https://arxiv.org/abs/2304.14108 |
20
+ | [vit_large_patch14_clip_336.laion2b_ft_in12k_in1k_inat21](https://huggingface.co/timm/vit_large_patch14_clip_336.laion2b_ft_in12k_in1k_inat21) | 90.29 | 97.44 | https://arxiv.org/abs/2212.07143 |
21
+
22
+ ## Fine-tune hparams
23
+
24
+ ```
25
+ ./distributed_train.sh 4 --data-dir /tfds/ --dataset tfds/i_naturalist2021 --amp -j 8 --model vit_large_patch14_clip_224 --img-size 336 --model-kwargs img_size=336 --val-split val --opt adamw --opt-eps 1e-6 --weight-decay .01 --lr 5e-5 -
26
+ -warmup-lr 0 --sched-on-updates --clip-grad 1.0 --pretrained -b 48 --num-classes 10000 --grad-accum-steps 8 --layer-decay 0.8
27
+ ```
28
+
29
+ ```
30
+ ./distributed_train.sh 4 --data-dir /home/ubuntu/ross/tfds/ --dataset tfds/i_naturalist2021 --amp -j 8 --model eva02_large_patch14_clip_336 --val-split val --opt adamw --opt-eps 1e-6 --weight-decay .01 --lr 5e-5 --warmup-lr 0 --sched-on-updates --clip-gra
31
+ d 1.0 --pretrained -b 40 --num-classes 10000 --grad-accum-steps 10 --layer-decay 0.8 --torchcompile
32
+ ```
33
+
34
+ ## Run Validation
35
+ ```
36
+ python validate.py /tfds/ --dataset tfds/i_naturalist2021 --model hf-hub:timm/eva02_large_patch14_clip_336.merged2b_ft_inat21 --split val --amp
37
+ ```
38
+
39
+ ## Citation
40
+
41
+ ```bibtex
42
+ @inproceedings{cherti2023reproducible,
43
+ title={Reproducible scaling laws for contrastive language-image learning},
44
+ author={Cherti, Mehdi and Beaumont, Romain and Wightman, Ross and Wortsman, Mitchell and Ilharco, Gabriel and Gordon, Cade and Schuhmann, Christoph and Schmidt, Ludwig and Jitsev, Jenia},
45
+ booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
46
+ pages={2818--2829},
47
+ year={2023}
48
+ }
49
+ ```
50
+
51
+ ```bibtex
52
+ @article{datacomp,
53
+ title={DataComp: In search of the next generation of multimodal datasets},
54
+ author={Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt},
55
+ journal={arXiv preprint arXiv:2304.14108},
56
+ year={2023}
57
+ }
58
+ ```
59
+
60
+ ```bibtex
61
+ @article{EVA02,
62
+ title={EVA-02: A Visual Representation for Neon Genesis},
63
+ author={Fang, Yuxin and Sun, Quan and Wang, Xinggang and Huang, Tiejun and Wang, Xinlong and Cao, Yue},
64
+ journal={arXiv preprint arXiv:2303.11331},
65
+ year={2023}
66
+ }
67
+ ```