timm
/

Image Classification
timm
PyTorch
Safetensors
rwightman HF staff commited on
Commit
0af730a
1 Parent(s): 93e407e

Update model config and README

Browse files
Files changed (2) hide show
  1. README.md +160 -2
  2. model.safetensors +3 -0
README.md CHANGED
@@ -2,7 +2,165 @@
2
  tags:
3
  - image-classification
4
  - timm
5
- - vision
6
- library_tag: timm
7
  license: apache-2.0
 
 
 
8
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  tags:
3
  - image-classification
4
  - timm
5
+ library_name: timm
 
6
  license: apache-2.0
7
+ datasets:
8
+ - imagenet-12k
9
+ - laion-2b
10
  ---
11
+ # Model card for vit_base_patch16_clip_224.laion2b_ft_in12k
12
+
13
+ A Vision Transformer (ViT) image classification model. Pretrained on LAION-2B image-text pairs using OpenCLIP. Fine-tuned on ImageNet-12k in `timm`. See recipes in [Reproducible scaling laws](https://arxiv.org/abs/2212.07143).
14
+
15
+
16
+ ## Model Details
17
+ - **Model Type:** Image classification / feature backbone
18
+ - **Model Stats:**
19
+ - Params (M): 94.9
20
+ - GMACs: 16.9
21
+ - Activations (M): 16.5
22
+ - Image size: 224 x 224
23
+ - **Papers:**
24
+ - OpenCLIP: https://github.com/mlfoundations/open_clip
25
+ - Reproducible scaling laws for contrastive language-image learning: https://arxiv.org/abs/2212.07143
26
+ - LAION-5B: An open large-scale dataset for training next generation image-text models: https://arxiv.org/abs/2210.08402
27
+ - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: https://arxiv.org/abs/2010.11929v2
28
+ - **Dataset:** ImageNet-12k
29
+ - **Pretrain Dataset:**
30
+ - LAION-2B
31
+
32
+ ## Model Usage
33
+ ### Image Classification
34
+ ```python
35
+ from urllib.request import urlopen
36
+ from PIL import Image
37
+ import timm
38
+
39
+ img = Image.open(urlopen(
40
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
41
+ ))
42
+
43
+ model = timm.create_model('vit_base_patch16_clip_224.laion2b_ft_in12k', pretrained=True)
44
+ model = model.eval()
45
+
46
+ # get model specific transforms (normalization, resize)
47
+ data_config = timm.data.resolve_model_data_config(model)
48
+ transforms = timm.data.create_transform(**data_config, is_training=False)
49
+
50
+ output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
51
+
52
+ top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
53
+ ```
54
+
55
+ ### Image Embeddings
56
+ ```python
57
+ from urllib.request import urlopen
58
+ from PIL import Image
59
+ import timm
60
+
61
+ img = Image.open(urlopen(
62
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
63
+ ))
64
+
65
+ model = timm.create_model(
66
+ 'vit_base_patch16_clip_224.laion2b_ft_in12k',
67
+ pretrained=True,
68
+ num_classes=0, # remove classifier nn.Linear
69
+ )
70
+ model = model.eval()
71
+
72
+ # get model specific transforms (normalization, resize)
73
+ data_config = timm.data.resolve_model_data_config(model)
74
+ transforms = timm.data.create_transform(**data_config, is_training=False)
75
+
76
+ output = model(transforms(img).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor
77
+
78
+ # or equivalently (without needing to set num_classes=0)
79
+
80
+ output = model.forward_features(transforms(img).unsqueeze(0))
81
+ # output is unpooled, a (1, 197, 768) shaped tensor
82
+
83
+ output = model.forward_head(output, pre_logits=True)
84
+ # output is a (1, num_features) shaped tensor
85
+ ```
86
+
87
+ ## Model Comparison
88
+ Explore the dataset and runtime metrics of this model in timm [model results](https://github.com/huggingface/pytorch-image-models/tree/main/results).
89
+
90
+ ## Citation
91
+ ```bibtex
92
+ @software{ilharco_gabriel_2021_5143773,
93
+ author = {Ilharco, Gabriel and
94
+ Wortsman, Mitchell and
95
+ Wightman, Ross and
96
+ Gordon, Cade and
97
+ Carlini, Nicholas and
98
+ Taori, Rohan and
99
+ Dave, Achal and
100
+ Shankar, Vaishaal and
101
+ Namkoong, Hongseok and
102
+ Miller, John and
103
+ Hajishirzi, Hannaneh and
104
+ Farhadi, Ali and
105
+ Schmidt, Ludwig},
106
+ title = {OpenCLIP},
107
+ month = jul,
108
+ year = 2021,
109
+ note = {If you use this software, please cite it as below.},
110
+ publisher = {Zenodo},
111
+ version = {0.1},
112
+ doi = {10.5281/zenodo.5143773},
113
+ url = {https://doi.org/10.5281/zenodo.5143773}
114
+ }
115
+ ```
116
+ ```bibtex
117
+ @article{cherti2022reproducible,
118
+ title={Reproducible scaling laws for contrastive language-image learning},
119
+ author={Cherti, Mehdi and Beaumont, Romain and Wightman, Ross and Wortsman, Mitchell and Ilharco, Gabriel and Gordon, Cade and Schuhmann, Christoph and Schmidt, Ludwig and Jitsev, Jenia},
120
+ journal={arXiv preprint arXiv:2212.07143},
121
+ year={2022}
122
+ }
123
+ ```
124
+ ```bibtex
125
+ @inproceedings{schuhmann2022laionb,
126
+ title={{LAION}-5B: An open large-scale dataset for training next generation image-text models},
127
+ author={Christoph Schuhmann and
128
+ Romain Beaumont and
129
+ Richard Vencu and
130
+ Cade W Gordon and
131
+ Ross Wightman and
132
+ Mehdi Cherti and
133
+ Theo Coombes and
134
+ Aarush Katta and
135
+ Clayton Mullis and
136
+ Mitchell Wortsman and
137
+ Patrick Schramowski and
138
+ Srivatsa R Kundurthy and
139
+ Katherine Crowson and
140
+ Ludwig Schmidt and
141
+ Robert Kaczmarczyk and
142
+ Jenia Jitsev},
143
+ booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
144
+ year={2022},
145
+ url={https://openreview.net/forum?id=M3Y74vmsMcY}
146
+ }
147
+ ```
148
+ ```bibtex
149
+ @article{dosovitskiy2020vit,
150
+ title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
151
+ author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
152
+ journal={ICLR},
153
+ year={2021}
154
+ }
155
+ ```
156
+ ```bibtex
157
+ @misc{rw2019timm,
158
+ author = {Ross Wightman},
159
+ title = {PyTorch Image Models},
160
+ year = {2019},
161
+ publisher = {GitHub},
162
+ journal = {GitHub repository},
163
+ doi = {10.5281/zenodo.4414861},
164
+ howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
165
+ }
166
+ ```
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:34203fc6b0dc9fe49056abc9ca8d9bcc21ecc3867eb1708793a6b48c2cf3a26b
3
+ size 379573260