timm
/

Image Classification
timm
PyTorch
Safetensors
rwightman HF staff commited on
Commit
aab059f
1 Parent(s): 21198df
Files changed (4) hide show
  1. README.md +105 -0
  2. config.json +33 -0
  3. model.safetensors +3 -0
  4. pytorch_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - image-classification
4
+ - timm
5
+ library_tag: timm
6
+ license: apache-2.0
7
+ datasets:
8
+ - imagenet-1k
9
+ - imagenet-21k
10
+ ---
11
+ # Model card for mixer_b16_224.goog_in21k_ft_in1k
12
+
13
+ A MLP-Mixer image classification model. Pretrained on ImageNet-21k and fine-tuned on ImageNet-1k by paper authors.
14
+
15
+ ## Model Details
16
+ - **Model Type:** Image classification / feature backbone
17
+ - **Model Stats:**
18
+ - Params (M): 59.9
19
+ - GMACs: 12.6
20
+ - Activations (M): 14.5
21
+ - Image size: 224 x 224
22
+ - **Papers:**
23
+ - MLP-Mixer: An all-MLP Architecture for Vision: https://arxiv.org/abs/2105.01601
24
+ - **Original:** https://github.com/google-research/vision_transformers
25
+ - **Dataset:** ImageNet-1k
26
+ - **Pretrain Dataset:** ImageNet-21k
27
+
28
+ ## Model Usage
29
+ ### Image Classification
30
+ ```python
31
+ from urllib.request import urlopen
32
+ from PIL import Image
33
+ import timm
34
+
35
+ img = Image.open(urlopen(
36
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
37
+ ))
38
+
39
+ model = timm.create_model('mixer_b16_224.goog_in21k_ft_in1k', pretrained=True)
40
+ model = model.eval()
41
+
42
+ # get model specific transforms (normalization, resize)
43
+ data_config = timm.data.resolve_model_data_config(model)
44
+ transforms = timm.data.create_transform(**data_config, is_training=False)
45
+
46
+ output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
47
+
48
+ top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
49
+ ```
50
+
51
+ ### Image Embeddings
52
+ ```python
53
+ from urllib.request import urlopen
54
+ from PIL import Image
55
+ import timm
56
+
57
+ img = Image.open(urlopen(
58
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
59
+ ))
60
+
61
+ model = timm.create_model(
62
+ 'mixer_b16_224.goog_in21k_ft_in1k',
63
+ pretrained=True,
64
+ num_classes=0, # remove classifier nn.Linear
65
+ )
66
+ model = model.eval()
67
+
68
+ # get model specific transforms (normalization, resize)
69
+ data_config = timm.data.resolve_model_data_config(model)
70
+ transforms = timm.data.create_transform(**data_config, is_training=False)
71
+
72
+ output = model(transforms(img).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor
73
+
74
+ # or equivalently (without needing to set num_classes=0)
75
+
76
+ output = model.forward_features(transforms(img).unsqueeze(0))
77
+ # output is unpooled, a (1, 196, 768) shaped tensor
78
+
79
+ output = model.forward_head(output, pre_logits=True)
80
+ # output is a (1, num_features) shaped tensor
81
+ ```
82
+
83
+ ## Model Comparison
84
+ Explore the dataset and runtime metrics of this model in timm [model results](https://github.com/huggingface/pytorch-image-models/tree/main/results).
85
+
86
+ ## Citation
87
+ ```bibtex
88
+ @article{tolstikhin2021mixer,
89
+ title={MLP-Mixer: An all-MLP Architecture for Vision},
90
+ author={Tolstikhin, Ilya and Houlsby, Neil and Kolesnikov, Alexander and Beyer, Lucas and Zhai, Xiaohua and Unterthiner, Thomas and Yung, Jessica and Steiner, Andreas and Keysers, Daniel and Uszkoreit, Jakob and Lucic, Mario and Dosovitskiy, Alexey},
91
+ journal={arXiv preprint arXiv:2105.01601},
92
+ year={2021}
93
+ }
94
+ ```
95
+ ```bibtex
96
+ @misc{rw2019timm,
97
+ author = {Ross Wightman},
98
+ title = {PyTorch Image Models},
99
+ year = {2019},
100
+ publisher = {GitHub},
101
+ journal = {GitHub repository},
102
+ doi = {10.5281/zenodo.4414861},
103
+ howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
104
+ }
105
+ ```
config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architecture": "mixer_b16_224",
3
+ "num_classes": 1000,
4
+ "num_features": 768,
5
+ "global_pool": "avg",
6
+ "pretrained_cfg": {
7
+ "tag": "goog_in21k_ft_in1k",
8
+ "custom_load": false,
9
+ "input_size": [
10
+ 3,
11
+ 224,
12
+ 224
13
+ ],
14
+ "fixed_input_size": true,
15
+ "interpolation": "bicubic",
16
+ "crop_pct": 0.875,
17
+ "crop_mode": "center",
18
+ "mean": [
19
+ 0.5,
20
+ 0.5,
21
+ 0.5
22
+ ],
23
+ "std": [
24
+ 0.5,
25
+ 0.5,
26
+ 0.5
27
+ ],
28
+ "num_classes": 1000,
29
+ "pool_size": null,
30
+ "first_conv": "stem.proj",
31
+ "classifier": "head"
32
+ }
33
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8ee4ad39c4cbb20b68e3e09391411634f8bce3ebdb68a18830274a37cf8c1728
3
+ size 239536434
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:53ad348971ebd7152b46062d251ff565bac3b937ba803e1eb89f03e6b538e7a5
3
+ size 239577701