rwightman HF staff commited on
Commit
025d2c4
1 Parent(s): 3c2b570

Update model config and README

Browse files
Files changed (3) hide show
  1. README.md +104 -2
  2. config.json +1 -1
  3. model.safetensors +3 -0
README.md CHANGED
@@ -2,6 +2,108 @@
2
  tags:
3
  - image-classification
4
  - timm
5
- library_tag: timm
 
 
 
6
  ---
7
- # Model card for timm/vit_medium_patch16_gap_240.in12k
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  tags:
3
  - image-classification
4
  - timm
5
+ library_name: timm
6
+ license: apache-2.0
7
+ datasets:
8
+ - imagenet-12k
9
  ---
10
+ # Model card for vit_medium_patch16_gap_240.sw_in12k
11
+
12
+ A Vision Transformer (ViT) image classification model. This is a `timm` specific variation of the architecture with token global average pooling. Trained on ImageNet-12k by Ross Wightman in `timm` using recipe template described below.
13
+
14
+ Recipe details:
15
+ * Based on Swin Transformer train / pretrain recipe with modifications (related to both DeiT and ConvNeXt recipes)
16
+ * AdamW optimizer, gradient clipping, EMA weight averaging
17
+ * Cosine LR schedule with warmup
18
+
19
+
20
+ ## Model Details
21
+ - **Model Type:** Image classification / feature backbone
22
+ - **Model Stats:**
23
+ - Params (M): 44.4
24
+ - GMACs: 8.6
25
+ - Activations (M): 12.6
26
+ - Image size: 240 x 240
27
+ - **Papers:**
28
+ - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: https://arxiv.org/abs/2010.11929v2
29
+ - **Dataset:** ImageNet-12k
30
+ - **Original:** https://github.com/huggingface/pytorch-image-models
31
+
32
+ ## Model Usage
33
+ ### Image Classification
34
+ ```python
35
+ from urllib.request import urlopen
36
+ from PIL import Image
37
+ import timm
38
+
39
+ img = Image.open(urlopen(
40
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
41
+ ))
42
+
43
+ model = timm.create_model('vit_medium_patch16_gap_240.sw_in12k', pretrained=True)
44
+ model = model.eval()
45
+
46
+ # get model specific transforms (normalization, resize)
47
+ data_config = timm.data.resolve_model_data_config(model)
48
+ transforms = timm.data.create_transform(**data_config, is_training=False)
49
+
50
+ output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
51
+
52
+ top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
53
+ ```
54
+
55
+ ### Image Embeddings
56
+ ```python
57
+ from urllib.request import urlopen
58
+ from PIL import Image
59
+ import timm
60
+
61
+ img = Image.open(urlopen(
62
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
63
+ ))
64
+
65
+ model = timm.create_model(
66
+ 'vit_medium_patch16_gap_240.sw_in12k',
67
+ pretrained=True,
68
+ num_classes=0, # remove classifier nn.Linear
69
+ )
70
+ model = model.eval()
71
+
72
+ # get model specific transforms (normalization, resize)
73
+ data_config = timm.data.resolve_model_data_config(model)
74
+ transforms = timm.data.create_transform(**data_config, is_training=False)
75
+
76
+ output = model(transforms(img).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor
77
+
78
+ # or equivalently (without needing to set num_classes=0)
79
+
80
+ output = model.forward_features(transforms(img).unsqueeze(0))
81
+ # output is unpooled, a (1, 225, 512) shaped tensor
82
+
83
+ output = model.forward_head(output, pre_logits=True)
84
+ # output is a (1, num_features) shaped tensor
85
+ ```
86
+
87
+ ## Model Comparison
88
+ Explore the dataset and runtime metrics of this model in timm [model results](https://github.com/huggingface/pytorch-image-models/tree/main/results).
89
+
90
+ ## Citation
91
+ ```bibtex
92
+ @misc{rw2019timm,
93
+ author = {Ross Wightman},
94
+ title = {PyTorch Image Models},
95
+ year = {2019},
96
+ publisher = {GitHub},
97
+ journal = {GitHub repository},
98
+ doi = {10.5281/zenodo.4414861},
99
+ howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
100
+ }
101
+ ```
102
+ ```bibtex
103
+ @article{dosovitskiy2020vit,
104
+ title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
105
+ author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
106
+ journal={ICLR},
107
+ year={2021}
108
+ }
109
+ ```
config.json CHANGED
@@ -4,7 +4,7 @@
4
  "num_features": 512,
5
  "global_pool": "avg",
6
  "pretrained_cfg": {
7
- "tag": "in12k",
8
  "custom_load": false,
9
  "input_size": [
10
  3,
 
4
  "num_features": 512,
5
  "global_pool": "avg",
6
  "pretrained_cfg": {
7
+ "tag": "sw_in12k",
8
  "custom_load": false,
9
  "input_size": [
10
  3,
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c69daacafcfdbe74131e323ab7d8d11dbad67c0bad65857f7249251ceae1fe8e
3
+ size 177601244