rwightman HF staff commited on
Commit
c7877d3
1 Parent(s): cff3e66

Update model config and README

Browse files
Files changed (2) hide show
  1. README.md +101 -2
  2. model.safetensors +3 -0
README.md CHANGED
@@ -2,6 +2,105 @@
2
  tags:
3
  - image-classification
4
  - timm
5
- library_tag: timm
 
 
 
 
6
  ---
7
- # Model card for vit_base_patch16_224.orig_in21k_ft_in1k
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  tags:
3
  - image-classification
4
  - timm
5
+ library_name: timm
6
+ license: apache-2.0
7
+ datasets:
8
+ - imagenet-1k
9
+ - imagenet-21k
10
  ---
11
+ # Model card for vit_base_patch16_224.orig_in21k_ft_in1k
12
+
13
+ A Vision Transformer (ViT) image classification model. Trained on ImageNet-21k and fine-tuned on ImageNet-1k in JAX by paper authors, ported to PyTorch by Ross Wightman.
14
+
15
+
16
+ ## Model Details
17
+ - **Model Type:** Image classification / feature backbone
18
+ - **Model Stats:**
19
+ - Params (M): 86.6
20
+ - GMACs: 16.9
21
+ - Activations (M): 16.5
22
+ - Image size: 224 x 224
23
+ - **Papers:**
24
+ - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: https://arxiv.org/abs/2010.11929v2
25
+ - **Dataset:** ImageNet-1k
26
+ - **Pretrain Dataset:** ImageNet-21k
27
+ - **Original:** https://github.com/google-research/vision_transformer
28
+
29
+ ## Model Usage
30
+ ### Image Classification
31
+ ```python
32
+ from urllib.request import urlopen
33
+ from PIL import Image
34
+ import timm
35
+
36
+ img = Image.open(urlopen(
37
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
38
+ ))
39
+
40
+ model = timm.create_model('vit_base_patch16_224.orig_in21k_ft_in1k', pretrained=True)
41
+ model = model.eval()
42
+
43
+ # get model specific transforms (normalization, resize)
44
+ data_config = timm.data.resolve_model_data_config(model)
45
+ transforms = timm.data.create_transform(**data_config, is_training=False)
46
+
47
+ output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
48
+
49
+ top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
50
+ ```
51
+
52
+ ### Image Embeddings
53
+ ```python
54
+ from urllib.request import urlopen
55
+ from PIL import Image
56
+ import timm
57
+
58
+ img = Image.open(urlopen(
59
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
60
+ ))
61
+
62
+ model = timm.create_model(
63
+ 'vit_base_patch16_224.orig_in21k_ft_in1k',
64
+ pretrained=True,
65
+ num_classes=0, # remove classifier nn.Linear
66
+ )
67
+ model = model.eval()
68
+
69
+ # get model specific transforms (normalization, resize)
70
+ data_config = timm.data.resolve_model_data_config(model)
71
+ transforms = timm.data.create_transform(**data_config, is_training=False)
72
+
73
+ output = model(transforms(img).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor
74
+
75
+ # or equivalently (without needing to set num_classes=0)
76
+
77
+ output = model.forward_features(transforms(img).unsqueeze(0))
78
+ # output is unpooled, a (1, 197, 768) shaped tensor
79
+
80
+ output = model.forward_head(output, pre_logits=True)
81
+ # output is a (1, num_features) shaped tensor
82
+ ```
83
+
84
+ ## Model Comparison
85
+ Explore the dataset and runtime metrics of this model in timm [model results](https://github.com/huggingface/pytorch-image-models/tree/main/results).
86
+
87
+ ## Citation
88
+ ```bibtex
89
+ @article{dosovitskiy2020vit,
90
+ title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
91
+ author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
92
+ journal={ICLR},
93
+ year={2021}
94
+ }
95
+ ```
96
+ ```bibtex
97
+ @misc{rw2019timm,
98
+ author = {Ross Wightman},
99
+ title = {PyTorch Image Models},
100
+ year = {2019},
101
+ publisher = {GitHub},
102
+ journal = {GitHub repository},
103
+ doi = {10.5281/zenodo.4414861},
104
+ howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
105
+ }
106
+ ```
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:669b949ea91fd19217f200cee259780bde32210c1eb9a5af3859f0dd8346b2ec
3
+ size 346284714