timm
/

Image Classification
timm
PyTorch
Safetensors
rwightman HF staff commited on
Commit
75ecbb9
1 Parent(s): 1171351
Files changed (3) hide show
  1. README.md +308 -0
  2. config.json +35 -0
  3. pytorch_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,308 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - image-classification
4
+ - timm
5
+ library_tag: timm
6
+ license: apache-2.0
7
+ datasets:
8
+ - imagenet-1k
9
+ - laion-2b
10
+ ---
11
+ # Model card for convnext_large_mlp.clip_laion2b_augreg_ft_in1k
12
+
13
+ A ConvNeXt image classification model. CLIP image tower weights pretrained in [OpenCLIP](https://github.com/mlfoundations/open_clip) on LAION and fine-tuned on ImageNet-1k in `timm` by Ross Wightman.
14
+
15
+ Please see related OpenCLIP model cards for more details on pretrain:
16
+ * https://huggingface.co/laion/CLIP-convnext_large_d.laion2B-s26B-b102K-augreg
17
+ * https://huggingface.co/laion/CLIP-convnext_base_w-laion2B-s13B-b82K-augreg
18
+ * https://huggingface.co/laion/CLIP-convnext_base_w_320-laion_aesthetic-s13B-b82K
19
+
20
+
21
+ ## Model Details
22
+ - **Model Type:** Image classification / feature backbone
23
+ - **Model Stats:**
24
+ - Params (M): 200.1
25
+ - GMACs: 44.9
26
+ - Activations (M): 56.3
27
+ - Image size: 256 x 256
28
+ - **Papers:**
29
+ - LAION-5B: An open large-scale dataset for training next generation image-text models: https://arxiv.org/abs/2210.08402
30
+ - A ConvNet for the 2020s: https://arxiv.org/abs/2201.03545
31
+ - Learning Transferable Visual Models From Natural Language Supervision: https://arxiv.org/abs/2103.00020
32
+ - **Original:** https://github.com/mlfoundations/open_clip
33
+ - **Pretrain Dataset:** LAION-2B
34
+ - **Dataset:** ImageNet-1k
35
+
36
+ ## Model Usage
37
+ ### Image Classification
38
+ ```python
39
+ from urllib.request import urlopen
40
+ from PIL import Image
41
+ import timm
42
+
43
+ img = Image.open(
44
+ urlopen('https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'))
45
+
46
+ model = timm.create_model('convnext_large_mlp.clip_laion2b_augreg_ft_in1k', pretrained=True)
47
+ model = model.eval()
48
+
49
+ # get model specific transforms (normalization, resize)
50
+ data_config = timm.data.resolve_model_data_config(model)
51
+ transforms = timm.data.create_transform(**data_config, is_training=False)
52
+
53
+ output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
54
+
55
+ top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
56
+ ```
57
+
58
+ ### Feature Map Extraction
59
+ ```python
60
+ from urllib.request import urlopen
61
+ from PIL import Image
62
+ import timm
63
+
64
+ img = Image.open(
65
+ urlopen('https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'))
66
+
67
+ model = timm.create_model(
68
+ 'convnext_large_mlp.clip_laion2b_augreg_ft_in1k',
69
+ pretrained=True,
70
+ features_only=True,
71
+ )
72
+ model = model.eval()
73
+
74
+ # get model specific transforms (normalization, resize)
75
+ data_config = timm.data.resolve_model_data_config(model)
76
+ transforms = timm.data.create_transform(**data_config, is_training=False)
77
+
78
+ output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
79
+
80
+ for o in output:
81
+ # print shape of each feature map in output
82
+ # e.g. for convnext_base:
83
+ # torch.Size([1, 128, 56, 56])
84
+ # torch.Size([1, 256, 28, 28])
85
+ # torch.Size([1, 512, 14, 14])
86
+ # torch.Size([1, 1024, 7, 7])
87
+ print(o.shape)
88
+ ```
89
+
90
+ ### Image Embeddings
91
+ ```python
92
+ from urllib.request import urlopen
93
+ from PIL import Image
94
+ import timm
95
+
96
+ img = Image.open(
97
+ urlopen('https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'))
98
+
99
+ model = timm.create_model(
100
+ 'convnext_large_mlp.clip_laion2b_augreg_ft_in1k',
101
+ pretrained=True,
102
+ num_classes=0, # remove classifier nn.Linear
103
+ )
104
+ model = model.eval()
105
+
106
+ # get model specific transforms (normalization, resize)
107
+ data_config = timm.data.resolve_model_data_config(model)
108
+ transforms = timm.data.create_transform(**data_config, is_training=False)
109
+
110
+ output = model(transforms(img).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor
111
+
112
+ # or equivalently (without needing to set num_classes=0)
113
+
114
+ output = model.forward_features(transforms(img).unsqueeze(0))
115
+ # output is unpooled (ie.e a (batch_size, num_features, H, W) tensor
116
+
117
+ output = model.forward_head(output, pre_logits=True)
118
+ # output is (batch_size, num_features) tensor
119
+ ```
120
+
121
+ ## Model Comparison
122
+ ### By Top-1
123
+ All timing numbers from eager model PyTorch 1.13 on RTX 3090 w/ AMP.
124
+
125
+ |model |top1 |top5 |img_size|param_count|gmacs |macts |samples_per_sec|batch_size|
126
+ |----------------------------------------------|------|------|--------|-----------|------|------|---------------|----------|
127
+ |[convnextv2_huge.fcmae_ft_in22k_in1k_512](https://huggingface.co/timm/convnextv2_huge.fcmae_ft_in22k_in1k_512)|88.848|98.742|512 |660.29 |600.81|413.07|28.58 |48 |
128
+ |[convnextv2_huge.fcmae_ft_in22k_in1k_384](https://huggingface.co/timm/convnextv2_huge.fcmae_ft_in22k_in1k_384)|88.668|98.738|384 |660.29 |337.96|232.35|50.56 |64 |
129
+ |[convnextv2_large.fcmae_ft_in22k_in1k_384](https://huggingface.co/timm/convnextv2_large.fcmae_ft_in22k_in1k_384)|88.196|98.532|384 |197.96 |101.1 |126.74|128.94 |128 |
130
+ |[convnext_xlarge.fb_in22k_ft_in1k_384](https://huggingface.co/timm/convnext_xlarge.fb_in22k_ft_in1k_384)|87.75 |98.556|384 |350.2 |179.2 |168.99|124.85 |192 |
131
+ |[convnextv2_base.fcmae_ft_in22k_in1k_384](https://huggingface.co/timm/convnextv2_base.fcmae_ft_in22k_in1k_384)|87.646|98.422|384 |88.72 |45.21 |84.49 |209.51 |256 |
132
+ |[convnext_large.fb_in22k_ft_in1k_384](https://huggingface.co/timm/convnext_large.fb_in22k_ft_in1k_384)|87.476|98.382|384 |197.77 |101.1 |126.74|194.66 |256 |
133
+ |[convnext_large_mlp.clip_laion2b_augreg_ft_in1k](https://huggingface.co/timm/convnext_large_mlp.clip_laion2b_augreg_ft_in1k)|87.344|98.218|256 |200.13 |44.94 |56.33 |438.08 |256 |
134
+ |[convnextv2_large.fcmae_ft_in22k_in1k](https://huggingface.co/timm/convnextv2_large.fcmae_ft_in22k_in1k)|87.26 |98.248|224 |197.96 |34.4 |43.13 |376.84 |256 |
135
+ |[convnext_xlarge.fb_in22k_ft_in1k](https://huggingface.co/timm/convnext_xlarge.fb_in22k_ft_in1k)|87.002|98.208|224 |350.2 |60.98 |57.5 |368.01 |256 |
136
+ |[convnext_base.fb_in22k_ft_in1k_384](https://huggingface.co/timm/convnext_base.fb_in22k_ft_in1k_384)|86.796|98.264|384 |88.59 |45.21 |84.49 |366.54 |256 |
137
+ |[convnextv2_base.fcmae_ft_in22k_in1k](https://huggingface.co/timm/convnextv2_base.fcmae_ft_in22k_in1k)|86.74 |98.022|224 |88.72 |15.38 |28.75 |624.23 |256 |
138
+ |[convnext_large.fb_in22k_ft_in1k](https://huggingface.co/timm/convnext_large.fb_in22k_ft_in1k)|86.636|98.028|224 |197.77 |34.4 |43.13 |581.43 |256 |
139
+ |[convnext_base.clip_laiona_augreg_ft_in1k_384](https://huggingface.co/timm/convnext_base.clip_laiona_augreg_ft_in1k_384)|86.504|97.97 |384 |88.59 |45.21 |84.49 |368.14 |256 |
140
+ |[convnextv2_huge.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_huge.fcmae_ft_in1k)|86.256|97.75 |224 |660.29 |115.0 |79.07 |154.72 |256 |
141
+ |[convnext_small.in12k_ft_in1k_384](https://huggingface.co/timm/convnext_small.in12k_ft_in1k_384)|86.182|97.92 |384 |50.22 |25.58 |63.37 |516.19 |256 |
142
+ |[convnext_base.clip_laion2b_augreg_ft_in1k](https://huggingface.co/timm/convnext_base.clip_laion2b_augreg_ft_in1k)|86.154|97.68 |256 |88.59 |20.09 |37.55 |819.86 |256 |
143
+ |[convnext_base.fb_in22k_ft_in1k](https://huggingface.co/timm/convnext_base.fb_in22k_ft_in1k)|85.822|97.866|224 |88.59 |15.38 |28.75 |1037.66 |256 |
144
+ |[convnext_small.fb_in22k_ft_in1k_384](https://huggingface.co/timm/convnext_small.fb_in22k_ft_in1k_384)|85.778|97.886|384 |50.22 |25.58 |63.37 |518.95 |256 |
145
+ |[convnextv2_large.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_large.fcmae_ft_in1k)|85.742|97.584|224 |197.96 |34.4 |43.13 |375.23 |256 |
146
+ |[convnext_small.in12k_ft_in1k](https://huggingface.co/timm/convnext_small.in12k_ft_in1k)|85.174|97.506|224 |50.22 |8.71 |21.56 |1474.31 |256 |
147
+ |[convnext_tiny.in12k_ft_in1k_384](https://huggingface.co/timm/convnext_tiny.in12k_ft_in1k_384)|85.118|97.608|384 |28.59 |13.14 |39.48 |856.76 |256 |
148
+ |[convnextv2_tiny.fcmae_ft_in22k_in1k_384](https://huggingface.co/timm/convnextv2_tiny.fcmae_ft_in22k_in1k_384)|85.112|97.63 |384 |28.64 |13.14 |39.48 |491.32 |256 |
149
+ |[convnextv2_base.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_base.fcmae_ft_in1k)|84.874|97.09 |224 |88.72 |15.38 |28.75 |625.33 |256 |
150
+ |[convnext_small.fb_in22k_ft_in1k](https://huggingface.co/timm/convnext_small.fb_in22k_ft_in1k)|84.562|97.394|224 |50.22 |8.71 |21.56 |1478.29 |256 |
151
+ |[convnext_large.fb_in1k](https://huggingface.co/timm/convnext_large.fb_in1k)|84.282|96.892|224 |197.77 |34.4 |43.13 |584.28 |256 |
152
+ |[convnext_tiny.in12k_ft_in1k](https://huggingface.co/timm/convnext_tiny.in12k_ft_in1k)|84.186|97.124|224 |28.59 |4.47 |13.44 |2433.7 |256 |
153
+ |[convnext_tiny.fb_in22k_ft_in1k_384](https://huggingface.co/timm/convnext_tiny.fb_in22k_ft_in1k_384)|84.084|97.14 |384 |28.59 |13.14 |39.48 |862.95 |256 |
154
+ |[convnextv2_tiny.fcmae_ft_in22k_in1k](https://huggingface.co/timm/convnextv2_tiny.fcmae_ft_in22k_in1k)|83.894|96.964|224 |28.64 |4.47 |13.44 |1452.72 |256 |
155
+ |[convnext_base.fb_in1k](https://huggingface.co/timm/convnext_base.fb_in1k)|83.82 |96.746|224 |88.59 |15.38 |28.75 |1054.0 |256 |
156
+ |[convnextv2_nano.fcmae_ft_in22k_in1k_384](https://huggingface.co/timm/convnextv2_nano.fcmae_ft_in22k_in1k_384)|83.37 |96.742|384 |15.62 |7.22 |24.61 |801.72 |256 |
157
+ |[convnext_small.fb_in1k](https://huggingface.co/timm/convnext_small.fb_in1k)|83.142|96.434|224 |50.22 |8.71 |21.56 |1464.0 |256 |
158
+ |[convnextv2_tiny.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_tiny.fcmae_ft_in1k)|82.92 |96.284|224 |28.64 |4.47 |13.44 |1425.62 |256 |
159
+ |[convnext_tiny.fb_in22k_ft_in1k](https://huggingface.co/timm/convnext_tiny.fb_in22k_ft_in1k)|82.898|96.616|224 |28.59 |4.47 |13.44 |2480.88 |256 |
160
+ |[convnext_nano.in12k_ft_in1k](https://huggingface.co/timm/convnext_nano.in12k_ft_in1k)|82.282|96.344|224 |15.59 |2.46 |8.37 |3926.52 |256 |
161
+ |[convnext_tiny_hnf.a2h_in1k](https://huggingface.co/timm/convnext_tiny_hnf.a2h_in1k)|82.216|95.852|224 |28.59 |4.47 |13.44 |2529.75 |256 |
162
+ |[convnext_tiny.fb_in1k](https://huggingface.co/timm/convnext_tiny.fb_in1k)|82.066|95.854|224 |28.59 |4.47 |13.44 |2346.26 |256 |
163
+ |[convnextv2_nano.fcmae_ft_in22k_in1k](https://huggingface.co/timm/convnextv2_nano.fcmae_ft_in22k_in1k)|82.03 |96.166|224 |15.62 |2.46 |8.37 |2300.18 |256 |
164
+ |[convnextv2_nano.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_nano.fcmae_ft_in1k)|81.83 |95.738|224 |15.62 |2.46 |8.37 |2321.48 |256 |
165
+ |[convnext_nano_ols.d1h_in1k](https://huggingface.co/timm/convnext_nano_ols.d1h_in1k)|80.866|95.246|224 |15.65 |2.65 |9.38 |3523.85 |256 |
166
+ |[convnext_nano.d1h_in1k](https://huggingface.co/timm/convnext_nano.d1h_in1k)|80.768|95.334|224 |15.59 |2.46 |8.37 |3915.58 |256 |
167
+ |[convnextv2_pico.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_pico.fcmae_ft_in1k)|80.304|95.072|224 |9.07 |1.37 |6.1 |3274.57 |256 |
168
+ |[convnext_pico.d1_in1k](https://huggingface.co/timm/convnext_pico.d1_in1k)|79.526|94.558|224 |9.05 |1.37 |6.1 |5686.88 |256 |
169
+ |[convnext_pico_ols.d1_in1k](https://huggingface.co/timm/convnext_pico_ols.d1_in1k)|79.522|94.692|224 |9.06 |1.43 |6.5 |5422.46 |256 |
170
+ |[convnextv2_femto.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_femto.fcmae_ft_in1k)|78.488|93.98 |224 |5.23 |0.79 |4.57 |4264.2 |256 |
171
+ |[convnext_femto_ols.d1_in1k](https://huggingface.co/timm/convnext_femto_ols.d1_in1k)|77.86 |93.83 |224 |5.23 |0.82 |4.87 |6910.6 |256 |
172
+ |[convnext_femto.d1_in1k](https://huggingface.co/timm/convnext_femto.d1_in1k)|77.454|93.68 |224 |5.22 |0.79 |4.57 |7189.92 |256 |
173
+ |[convnextv2_atto.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_atto.fcmae_ft_in1k)|76.664|93.044|224 |3.71 |0.55 |3.81 |4728.91 |256 |
174
+ |[convnext_atto_ols.a2_in1k](https://huggingface.co/timm/convnext_atto_ols.a2_in1k)|75.88 |92.846|224 |3.7 |0.58 |4.11 |7963.16 |256 |
175
+ |[convnext_atto.d2_in1k](https://huggingface.co/timm/convnext_atto.d2_in1k)|75.664|92.9 |224 |3.7 |0.55 |3.81 |8439.22 |256 |
176
+
177
+ ### By Throughput (samples / sec)
178
+ All timing numbers from eager model PyTorch 1.13 on RTX 3090 w/ AMP.
179
+
180
+ |model |top1 |top5 |img_size|param_count|gmacs |macts |samples_per_sec|batch_size|
181
+ |----------------------------------------------|------|------|--------|-----------|------|------|---------------|----------|
182
+ |[convnext_atto.d2_in1k](https://huggingface.co/timm/convnext_atto.d2_in1k)|75.664|92.9 |224 |3.7 |0.55 |3.81 |8439.22 |256 |
183
+ |[convnext_atto_ols.a2_in1k](https://huggingface.co/timm/convnext_atto_ols.a2_in1k)|75.88 |92.846|224 |3.7 |0.58 |4.11 |7963.16 |256 |
184
+ |[convnext_femto.d1_in1k](https://huggingface.co/timm/convnext_femto.d1_in1k)|77.454|93.68 |224 |5.22 |0.79 |4.57 |7189.92 |256 |
185
+ |[convnext_femto_ols.d1_in1k](https://huggingface.co/timm/convnext_femto_ols.d1_in1k)|77.86 |93.83 |224 |5.23 |0.82 |4.87 |6910.6 |256 |
186
+ |[convnext_pico.d1_in1k](https://huggingface.co/timm/convnext_pico.d1_in1k)|79.526|94.558|224 |9.05 |1.37 |6.1 |5686.88 |256 |
187
+ |[convnext_pico_ols.d1_in1k](https://huggingface.co/timm/convnext_pico_ols.d1_in1k)|79.522|94.692|224 |9.06 |1.43 |6.5 |5422.46 |256 |
188
+ |[convnextv2_atto.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_atto.fcmae_ft_in1k)|76.664|93.044|224 |3.71 |0.55 |3.81 |4728.91 |256 |
189
+ |[convnextv2_femto.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_femto.fcmae_ft_in1k)|78.488|93.98 |224 |5.23 |0.79 |4.57 |4264.2 |256 |
190
+ |[convnext_nano.in12k_ft_in1k](https://huggingface.co/timm/convnext_nano.in12k_ft_in1k)|82.282|96.344|224 |15.59 |2.46 |8.37 |3926.52 |256 |
191
+ |[convnext_nano.d1h_in1k](https://huggingface.co/timm/convnext_nano.d1h_in1k)|80.768|95.334|224 |15.59 |2.46 |8.37 |3915.58 |256 |
192
+ |[convnext_nano_ols.d1h_in1k](https://huggingface.co/timm/convnext_nano_ols.d1h_in1k)|80.866|95.246|224 |15.65 |2.65 |9.38 |3523.85 |256 |
193
+ |[convnextv2_pico.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_pico.fcmae_ft_in1k)|80.304|95.072|224 |9.07 |1.37 |6.1 |3274.57 |256 |
194
+ |[convnext_tiny_hnf.a2h_in1k](https://huggingface.co/timm/convnext_tiny_hnf.a2h_in1k)|82.216|95.852|224 |28.59 |4.47 |13.44 |2529.75 |256 |
195
+ |[convnext_tiny.fb_in22k_ft_in1k](https://huggingface.co/timm/convnext_tiny.fb_in22k_ft_in1k)|82.898|96.616|224 |28.59 |4.47 |13.44 |2480.88 |256 |
196
+ |[convnext_tiny.in12k_ft_in1k](https://huggingface.co/timm/convnext_tiny.in12k_ft_in1k)|84.186|97.124|224 |28.59 |4.47 |13.44 |2433.7 |256 |
197
+ |[convnext_tiny.fb_in1k](https://huggingface.co/timm/convnext_tiny.fb_in1k)|82.066|95.854|224 |28.59 |4.47 |13.44 |2346.26 |256 |
198
+ |[convnextv2_nano.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_nano.fcmae_ft_in1k)|81.83 |95.738|224 |15.62 |2.46 |8.37 |2321.48 |256 |
199
+ |[convnextv2_nano.fcmae_ft_in22k_in1k](https://huggingface.co/timm/convnextv2_nano.fcmae_ft_in22k_in1k)|82.03 |96.166|224 |15.62 |2.46 |8.37 |2300.18 |256 |
200
+ |[convnext_small.fb_in22k_ft_in1k](https://huggingface.co/timm/convnext_small.fb_in22k_ft_in1k)|84.562|97.394|224 |50.22 |8.71 |21.56 |1478.29 |256 |
201
+ |[convnext_small.in12k_ft_in1k](https://huggingface.co/timm/convnext_small.in12k_ft_in1k)|85.174|97.506|224 |50.22 |8.71 |21.56 |1474.31 |256 |
202
+ |[convnext_small.fb_in1k](https://huggingface.co/timm/convnext_small.fb_in1k)|83.142|96.434|224 |50.22 |8.71 |21.56 |1464.0 |256 |
203
+ |[convnextv2_tiny.fcmae_ft_in22k_in1k](https://huggingface.co/timm/convnextv2_tiny.fcmae_ft_in22k_in1k)|83.894|96.964|224 |28.64 |4.47 |13.44 |1452.72 |256 |
204
+ |[convnextv2_tiny.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_tiny.fcmae_ft_in1k)|82.92 |96.284|224 |28.64 |4.47 |13.44 |1425.62 |256 |
205
+ |[convnext_base.fb_in1k](https://huggingface.co/timm/convnext_base.fb_in1k)|83.82 |96.746|224 |88.59 |15.38 |28.75 |1054.0 |256 |
206
+ |[convnext_base.fb_in22k_ft_in1k](https://huggingface.co/timm/convnext_base.fb_in22k_ft_in1k)|85.822|97.866|224 |88.59 |15.38 |28.75 |1037.66 |256 |
207
+ |[convnext_tiny.fb_in22k_ft_in1k_384](https://huggingface.co/timm/convnext_tiny.fb_in22k_ft_in1k_384)|84.084|97.14 |384 |28.59 |13.14 |39.48 |862.95 |256 |
208
+ |[convnext_tiny.in12k_ft_in1k_384](https://huggingface.co/timm/convnext_tiny.in12k_ft_in1k_384)|85.118|97.608|384 |28.59 |13.14 |39.48 |856.76 |256 |
209
+ |[convnext_base.clip_laion2b_augreg_ft_in1k](https://huggingface.co/timm/convnext_base.clip_laion2b_augreg_ft_in1k)|86.154|97.68 |256 |88.59 |20.09 |37.55 |819.86 |256 |
210
+ |[convnextv2_nano.fcmae_ft_in22k_in1k_384](https://huggingface.co/timm/convnextv2_nano.fcmae_ft_in22k_in1k_384)|83.37 |96.742|384 |15.62 |7.22 |24.61 |801.72 |256 |
211
+ |[convnextv2_base.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_base.fcmae_ft_in1k)|84.874|97.09 |224 |88.72 |15.38 |28.75 |625.33 |256 |
212
+ |[convnextv2_base.fcmae_ft_in22k_in1k](https://huggingface.co/timm/convnextv2_base.fcmae_ft_in22k_in1k)|86.74 |98.022|224 |88.72 |15.38 |28.75 |624.23 |256 |
213
+ |[convnext_large.fb_in1k](https://huggingface.co/timm/convnext_large.fb_in1k)|84.282|96.892|224 |197.77 |34.4 |43.13 |584.28 |256 |
214
+ |[convnext_large.fb_in22k_ft_in1k](https://huggingface.co/timm/convnext_large.fb_in22k_ft_in1k)|86.636|98.028|224 |197.77 |34.4 |43.13 |581.43 |256 |
215
+ |[convnext_small.fb_in22k_ft_in1k_384](https://huggingface.co/timm/convnext_small.fb_in22k_ft_in1k_384)|85.778|97.886|384 |50.22 |25.58 |63.37 |518.95 |256 |
216
+ |[convnext_small.in12k_ft_in1k_384](https://huggingface.co/timm/convnext_small.in12k_ft_in1k_384)|86.182|97.92 |384 |50.22 |25.58 |63.37 |516.19 |256 |
217
+ |[convnextv2_tiny.fcmae_ft_in22k_in1k_384](https://huggingface.co/timm/convnextv2_tiny.fcmae_ft_in22k_in1k_384)|85.112|97.63 |384 |28.64 |13.14 |39.48 |491.32 |256 |
218
+ |[convnext_large_mlp.clip_laion2b_augreg_ft_in1k](https://huggingface.co/timm/convnext_large_mlp.clip_laion2b_augreg_ft_in1k)|87.344|98.218|256 |200.13 |44.94 |56.33 |438.08 |256 |
219
+ |[convnextv2_large.fcmae_ft_in22k_in1k](https://huggingface.co/timm/convnextv2_large.fcmae_ft_in22k_in1k)|87.26 |98.248|224 |197.96 |34.4 |43.13 |376.84 |256 |
220
+ |[convnextv2_large.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_large.fcmae_ft_in1k)|85.742|97.584|224 |197.96 |34.4 |43.13 |375.23 |256 |
221
+ |[convnext_base.clip_laiona_augreg_ft_in1k_384](https://huggingface.co/timm/convnext_base.clip_laiona_augreg_ft_in1k_384)|86.504|97.97 |384 |88.59 |45.21 |84.49 |368.14 |256 |
222
+ |[convnext_xlarge.fb_in22k_ft_in1k](https://huggingface.co/timm/convnext_xlarge.fb_in22k_ft_in1k)|87.002|98.208|224 |350.2 |60.98 |57.5 |368.01 |256 |
223
+ |[convnext_base.fb_in22k_ft_in1k_384](https://huggingface.co/timm/convnext_base.fb_in22k_ft_in1k_384)|86.796|98.264|384 |88.59 |45.21 |84.49 |366.54 |256 |
224
+ |[convnextv2_base.fcmae_ft_in22k_in1k_384](https://huggingface.co/timm/convnextv2_base.fcmae_ft_in22k_in1k_384)|87.646|98.422|384 |88.72 |45.21 |84.49 |209.51 |256 |
225
+ |[convnext_large.fb_in22k_ft_in1k_384](https://huggingface.co/timm/convnext_large.fb_in22k_ft_in1k_384)|87.476|98.382|384 |197.77 |101.1 |126.74|194.66 |256 |
226
+ |[convnextv2_huge.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_huge.fcmae_ft_in1k)|86.256|97.75 |224 |660.29 |115.0 |79.07 |154.72 |256 |
227
+ |[convnextv2_large.fcmae_ft_in22k_in1k_384](https://huggingface.co/timm/convnextv2_large.fcmae_ft_in22k_in1k_384)|88.196|98.532|384 |197.96 |101.1 |126.74|128.94 |128 |
228
+ |[convnext_xlarge.fb_in22k_ft_in1k_384](https://huggingface.co/timm/convnext_xlarge.fb_in22k_ft_in1k_384)|87.75 |98.556|384 |350.2 |179.2 |168.99|124.85 |192 |
229
+ |[convnextv2_huge.fcmae_ft_in22k_in1k_384](https://huggingface.co/timm/convnextv2_huge.fcmae_ft_in22k_in1k_384)|88.668|98.738|384 |660.29 |337.96|232.35|50.56 |64 |
230
+ |[convnextv2_huge.fcmae_ft_in22k_in1k_512](https://huggingface.co/timm/convnextv2_huge.fcmae_ft_in22k_in1k_512)|88.848|98.742|512 |660.29 |600.81|413.07|28.58 |48 |
231
+
232
+ ## Citation
233
+ ```bibtex
234
+ @software{ilharco_gabriel_2021_5143773,
235
+ author = {Ilharco, Gabriel and
236
+ Wortsman, Mitchell and
237
+ Wightman, Ross and
238
+ Gordon, Cade and
239
+ Carlini, Nicholas and
240
+ Taori, Rohan and
241
+ Dave, Achal and
242
+ Shankar, Vaishaal and
243
+ Namkoong, Hongseok and
244
+ Miller, John and
245
+ Hajishirzi, Hannaneh and
246
+ Farhadi, Ali and
247
+ Schmidt, Ludwig},
248
+ title = {OpenCLIP},
249
+ month = jul,
250
+ year = 2021,
251
+ note = {If you use this software, please cite it as below.},
252
+ publisher = {Zenodo},
253
+ version = {0.1},
254
+ doi = {10.5281/zenodo.5143773},
255
+ url = {https://doi.org/10.5281/zenodo.5143773}
256
+ }
257
+ ```
258
+ ```bibtex
259
+ @inproceedings{schuhmann2022laionb,
260
+ title={{LAION}-5B: An open large-scale dataset for training next generation image-text models},
261
+ author={Christoph Schuhmann and
262
+ Romain Beaumont and
263
+ Richard Vencu and
264
+ Cade W Gordon and
265
+ Ross Wightman and
266
+ Mehdi Cherti and
267
+ Theo Coombes and
268
+ Aarush Katta and
269
+ Clayton Mullis and
270
+ Mitchell Wortsman and
271
+ Patrick Schramowski and
272
+ Srivatsa R Kundurthy and
273
+ Katherine Crowson and
274
+ Ludwig Schmidt and
275
+ Robert Kaczmarczyk and
276
+ Jenia Jitsev},
277
+ booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
278
+ year={2022},
279
+ url={https://openreview.net/forum?id=M3Y74vmsMcY}
280
+ }
281
+ ```
282
+ ```bibtex
283
+ @misc{rw2019timm,
284
+ author = {Ross Wightman},
285
+ title = {PyTorch Image Models},
286
+ year = {2019},
287
+ publisher = {GitHub},
288
+ journal = {GitHub repository},
289
+ doi = {10.5281/zenodo.4414861},
290
+ howpublished = {\url{https://github.com/rwightman/pytorch-image-models}}
291
+ }
292
+ ```
293
+ ```bibtex
294
+ @inproceedings{Radford2021LearningTV,
295
+ title={Learning Transferable Visual Models From Natural Language Supervision},
296
+ author={Alec Radford and Jong Wook Kim and Chris Hallacy and A. Ramesh and Gabriel Goh and Sandhini Agarwal and Girish Sastry and Amanda Askell and Pamela Mishkin and Jack Clark and Gretchen Krueger and Ilya Sutskever},
297
+ booktitle={ICML},
298
+ year={2021}
299
+ }
300
+ ```
301
+ ```bibtex
302
+ @article{liu2022convnet,
303
+ author = {Zhuang Liu and Hanzi Mao and Chao-Yuan Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie},
304
+ title = {A ConvNet for the 2020s},
305
+ journal = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
306
+ year = {2022},
307
+ }
308
+ ```
config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architecture": "convnext_large_mlp",
3
+ "num_classes": 1000,
4
+ "num_features": 1536,
5
+ "pretrained_cfg": {
6
+ "tag": "clip_laion2b_augreg_ft_in1k",
7
+ "custom_load": false,
8
+ "input_size": [
9
+ 3,
10
+ 256,
11
+ 256
12
+ ],
13
+ "fixed_input_size": false,
14
+ "interpolation": "bicubic",
15
+ "crop_pct": 1.0,
16
+ "crop_mode": "center",
17
+ "mean": [
18
+ 0.48145466,
19
+ 0.4578275,
20
+ 0.40821073
21
+ ],
22
+ "std": [
23
+ 0.26862954,
24
+ 0.26130258,
25
+ 0.27577711
26
+ ],
27
+ "num_classes": 1000,
28
+ "pool_size": [
29
+ 8,
30
+ 8
31
+ ],
32
+ "first_conv": "stem.0",
33
+ "classifier": "head.fc"
34
+ }
35
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f8be99b2929e79e0e59483104a7e1bc1eb7d2a56badcd78e655cdb46310edbe
3
+ size 800639221