timm
/

Image Classification
timm
PyTorch
Safetensors
rwightman HF staff commited on
Commit
4997d44
1 Parent(s): 3a463ee

Update model config and README

Browse files
Files changed (2) hide show
  1. README.md +162 -2
  2. model.safetensors +3 -0
README.md CHANGED
@@ -2,7 +2,167 @@
2
  tags:
3
  - image-classification
4
  - timm
5
- - vision
6
- library_tag: timm
7
  license: apache-2.0
 
 
 
 
8
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  tags:
3
  - image-classification
4
  - timm
5
+ library_name: timm
 
6
  license: apache-2.0
7
+ datasets:
8
+ - imagenet-1k
9
+ - laion-2b
10
+ - imagenet-12k
11
  ---
12
+ # Model card for vit_base_patch32_clip_384.laion2b_ft_in12k_in1k
13
+
14
+ A Vision Transformer (ViT) image classification model. Pretrained on LAION-2B image-text pairs using OpenCLIP. Fine-tuned on ImageNet-12k and then ImageNet-1k in `timm`. See recipes in [Reproducible scaling laws](https://arxiv.org/abs/2212.07143).
15
+
16
+
17
+ ## Model Details
18
+ - **Model Type:** Image classification / feature backbone
19
+ - **Model Stats:**
20
+ - Params (M): 88.3
21
+ - GMACs: 12.7
22
+ - Activations (M): 12.1
23
+ - Image size: 384 x 384
24
+ - **Papers:**
25
+ - OpenCLIP: https://github.com/mlfoundations/open_clip
26
+ - Reproducible scaling laws for contrastive language-image learning: https://arxiv.org/abs/2212.07143
27
+ - LAION-5B: An open large-scale dataset for training next generation image-text models: https://arxiv.org/abs/2210.08402
28
+ - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: https://arxiv.org/abs/2010.11929v2
29
+ - **Dataset:** ImageNet-1k
30
+ - **Pretrain Dataset:**
31
+ - LAION-2B
32
+ - ImageNet-12k
33
+
34
+ ## Model Usage
35
+ ### Image Classification
36
+ ```python
37
+ from urllib.request import urlopen
38
+ from PIL import Image
39
+ import timm
40
+
41
+ img = Image.open(urlopen(
42
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
43
+ ))
44
+
45
+ model = timm.create_model('vit_base_patch32_clip_384.laion2b_ft_in12k_in1k', pretrained=True)
46
+ model = model.eval()
47
+
48
+ # get model specific transforms (normalization, resize)
49
+ data_config = timm.data.resolve_model_data_config(model)
50
+ transforms = timm.data.create_transform(**data_config, is_training=False)
51
+
52
+ output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
53
+
54
+ top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
55
+ ```
56
+
57
+ ### Image Embeddings
58
+ ```python
59
+ from urllib.request import urlopen
60
+ from PIL import Image
61
+ import timm
62
+
63
+ img = Image.open(urlopen(
64
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
65
+ ))
66
+
67
+ model = timm.create_model(
68
+ 'vit_base_patch32_clip_384.laion2b_ft_in12k_in1k',
69
+ pretrained=True,
70
+ num_classes=0, # remove classifier nn.Linear
71
+ )
72
+ model = model.eval()
73
+
74
+ # get model specific transforms (normalization, resize)
75
+ data_config = timm.data.resolve_model_data_config(model)
76
+ transforms = timm.data.create_transform(**data_config, is_training=False)
77
+
78
+ output = model(transforms(img).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor
79
+
80
+ # or equivalently (without needing to set num_classes=0)
81
+
82
+ output = model.forward_features(transforms(img).unsqueeze(0))
83
+ # output is unpooled, a (1, 145, 768) shaped tensor
84
+
85
+ output = model.forward_head(output, pre_logits=True)
86
+ # output is a (1, num_features) shaped tensor
87
+ ```
88
+
89
+ ## Model Comparison
90
+ Explore the dataset and runtime metrics of this model in timm [model results](https://github.com/huggingface/pytorch-image-models/tree/main/results).
91
+
92
+ ## Citation
93
+ ```bibtex
94
+ @software{ilharco_gabriel_2021_5143773,
95
+ author = {Ilharco, Gabriel and
96
+ Wortsman, Mitchell and
97
+ Wightman, Ross and
98
+ Gordon, Cade and
99
+ Carlini, Nicholas and
100
+ Taori, Rohan and
101
+ Dave, Achal and
102
+ Shankar, Vaishaal and
103
+ Namkoong, Hongseok and
104
+ Miller, John and
105
+ Hajishirzi, Hannaneh and
106
+ Farhadi, Ali and
107
+ Schmidt, Ludwig},
108
+ title = {OpenCLIP},
109
+ month = jul,
110
+ year = 2021,
111
+ note = {If you use this software, please cite it as below.},
112
+ publisher = {Zenodo},
113
+ version = {0.1},
114
+ doi = {10.5281/zenodo.5143773},
115
+ url = {https://doi.org/10.5281/zenodo.5143773}
116
+ }
117
+ ```
118
+ ```bibtex
119
+ @article{cherti2022reproducible,
120
+ title={Reproducible scaling laws for contrastive language-image learning},
121
+ author={Cherti, Mehdi and Beaumont, Romain and Wightman, Ross and Wortsman, Mitchell and Ilharco, Gabriel and Gordon, Cade and Schuhmann, Christoph and Schmidt, Ludwig and Jitsev, Jenia},
122
+ journal={arXiv preprint arXiv:2212.07143},
123
+ year={2022}
124
+ }
125
+ ```
126
+ ```bibtex
127
+ @inproceedings{schuhmann2022laionb,
128
+ title={{LAION}-5B: An open large-scale dataset for training next generation image-text models},
129
+ author={Christoph Schuhmann and
130
+ Romain Beaumont and
131
+ Richard Vencu and
132
+ Cade W Gordon and
133
+ Ross Wightman and
134
+ Mehdi Cherti and
135
+ Theo Coombes and
136
+ Aarush Katta and
137
+ Clayton Mullis and
138
+ Mitchell Wortsman and
139
+ Patrick Schramowski and
140
+ Srivatsa R Kundurthy and
141
+ Katherine Crowson and
142
+ Ludwig Schmidt and
143
+ Robert Kaczmarczyk and
144
+ Jenia Jitsev},
145
+ booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
146
+ year={2022},
147
+ url={https://openreview.net/forum?id=M3Y74vmsMcY}
148
+ }
149
+ ```
150
+ ```bibtex
151
+ @article{dosovitskiy2020vit,
152
+ title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
153
+ author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
154
+ journal={ICLR},
155
+ year={2021}
156
+ }
157
+ ```
158
+ ```bibtex
159
+ @misc{rw2019timm,
160
+ author = {Ross Wightman},
161
+ title = {PyTorch Image Models},
162
+ year = {2019},
163
+ publisher = {GitHub},
164
+ journal = {GitHub repository},
165
+ doi = {10.5281/zenodo.4414861},
166
+ howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
167
+ }
168
+ ```
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c573607942825395eb1be9fcc51a0e8165612c0e4f0f2249c5610c47fef87826
3
+ size 353206012