timm
/

Image Classification
timm
PyTorch
Safetensors
rwightman HF staff commited on
Commit
2217173
1 Parent(s): 7f59c56

Update model config and README

Browse files
Files changed (2) hide show
  1. README.md +108 -2
  2. model.safetensors +3 -0
README.md CHANGED
@@ -2,6 +2,112 @@
2
  tags:
3
  - image-classification
4
  - timm
5
- library_tag: timm
 
 
 
6
  ---
7
- # Model card for vit_base_patch32_224.augreg_in21k
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  tags:
3
  - image-classification
4
  - timm
5
+ library_name: timm
6
+ license: apache-2.0
7
+ datasets:
8
+ - imagenet-21k
9
  ---
10
+ # Model card for vit_base_patch32_224.augreg_in21k
11
+
12
+ A Vision Transformer (ViT) image classification model. Trained on ImageNet-21k (with additional augmentation and regularization) in JAX by paper authors, ported to PyTorch by Ross Wightman.
13
+
14
+
15
+ ## Model Details
16
+ - **Model Type:** Image classification / feature backbone
17
+ - **Model Stats:**
18
+ - Params (M): 104.3
19
+ - GMACs: 4.4
20
+ - Activations (M): 4.2
21
+ - Image size: 224 x 224
22
+ - **Papers:**
23
+ - How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers: https://arxiv.org/abs/2106.10270
24
+ - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: https://arxiv.org/abs/2010.11929v2
25
+ - **Dataset:** ImageNet-21k
26
+ - **Original:** https://github.com/google-research/vision_transformer
27
+
28
+ ## Model Usage
29
+ ### Image Classification
30
+ ```python
31
+ from urllib.request import urlopen
32
+ from PIL import Image
33
+ import timm
34
+
35
+ img = Image.open(urlopen(
36
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
37
+ ))
38
+
39
+ model = timm.create_model('vit_base_patch32_224.augreg_in21k', pretrained=True)
40
+ model = model.eval()
41
+
42
+ # get model specific transforms (normalization, resize)
43
+ data_config = timm.data.resolve_model_data_config(model)
44
+ transforms = timm.data.create_transform(**data_config, is_training=False)
45
+
46
+ output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
47
+
48
+ top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
49
+ ```
50
+
51
+ ### Image Embeddings
52
+ ```python
53
+ from urllib.request import urlopen
54
+ from PIL import Image
55
+ import timm
56
+
57
+ img = Image.open(urlopen(
58
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
59
+ ))
60
+
61
+ model = timm.create_model(
62
+ 'vit_base_patch32_224.augreg_in21k',
63
+ pretrained=True,
64
+ num_classes=0, # remove classifier nn.Linear
65
+ )
66
+ model = model.eval()
67
+
68
+ # get model specific transforms (normalization, resize)
69
+ data_config = timm.data.resolve_model_data_config(model)
70
+ transforms = timm.data.create_transform(**data_config, is_training=False)
71
+
72
+ output = model(transforms(img).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor
73
+
74
+ # or equivalently (without needing to set num_classes=0)
75
+
76
+ output = model.forward_features(transforms(img).unsqueeze(0))
77
+ # output is unpooled, a (1, 50, 768) shaped tensor
78
+
79
+ output = model.forward_head(output, pre_logits=True)
80
+ # output is a (1, num_features) shaped tensor
81
+ ```
82
+
83
+ ## Model Comparison
84
+ Explore the dataset and runtime metrics of this model in timm [model results](https://github.com/huggingface/pytorch-image-models/tree/main/results).
85
+
86
+ ## Citation
87
+ ```bibtex
88
+ @article{steiner2021augreg,
89
+ title={How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers},
90
+ author={Steiner, Andreas and Kolesnikov, Alexander and and Zhai, Xiaohua and Wightman, Ross and Uszkoreit, Jakob and Beyer, Lucas},
91
+ journal={arXiv preprint arXiv:2106.10270},
92
+ year={2021}
93
+ }
94
+ ```
95
+ ```bibtex
96
+ @article{dosovitskiy2020vit,
97
+ title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
98
+ author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
99
+ journal={ICLR},
100
+ year={2021}
101
+ }
102
+ ```
103
+ ```bibtex
104
+ @misc{rw2019timm,
105
+ author = {Ross Wightman},
106
+ title = {PyTorch Image Models},
107
+ year = {2019},
108
+ publisher = {GitHub},
109
+ journal = {GitHub repository},
110
+ doi = {10.5281/zenodo.4414861},
111
+ howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
112
+ }
113
+ ```
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6a44f9ddf82d641c34557bac71bc740164d54c302f87a8937c2fdcd82828908
3
+ size 417024088