Image Classification
fastai
timm
iloncka commited on
Commit
5c29ece
1 Parent(s): c11bc9f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +105 -12
README.md CHANGED
@@ -1,21 +1,114 @@
1
  ---
2
- license: apache-2.0
3
  tags:
4
- - vision
5
- - fastai
6
- - timm
7
  - image-classification
 
 
 
8
  datasets:
 
9
  - iloncka/mosal
10
- library_name: fastai
11
  ---
12
- # Model card
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
- ## Model description
15
- More information needed
 
16
 
17
- ## Intended uses & limitations
18
- More information needed
19
 
20
- ## Training and evaluation data
21
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  tags:
 
 
 
3
  - image-classification
4
+ - timm
5
+ library_name: timm
6
+ license: apache-2.0
7
  datasets:
8
+ - imagenet-21k
9
  - iloncka/mosal
 
10
  ---
11
+ # Model card for vit_tiny_patch16_224.augreg_in21k
12
+
13
+ A Vision Transformer (ViT) image classification model. Trained on ImageNet-21k (with additional augmentation and regularization) in JAX by paper authors, ported to PyTorch by Ross Wightman.
14
+
15
+
16
+ ## Model Details
17
+ - **Model Type:** Image classification / feature backbone
18
+ - **Model Stats:**
19
+ - Params (M): 9.7
20
+ - GMACs: 1.1
21
+ - Activations (M): 4.1
22
+ - Image size: 224 x 224
23
+ - **Papers:**
24
+ - How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers: https://arxiv.org/abs/2106.10270
25
+ - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: https://arxiv.org/abs/2010.11929v2
26
+ - **Dataset:** ImageNet-21k
27
+ - **Original:** https://github.com/google-research/vision_transformer
28
+
29
+ ## Model Usage
30
+ ### Image Classification
31
+ ```python
32
+ from urllib.request import urlopen
33
+ from PIL import Image
34
+ import timm
35
+
36
+ img = Image.open(urlopen(
37
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
38
+ ))
39
+
40
+ model = timm.create_model('vit_tiny_patch16_224.augreg_in21k', pretrained=True)
41
+ model = model.eval()
42
+
43
+ # get model specific transforms (normalization, resize)
44
+ data_config = timm.data.resolve_model_data_config(model)
45
+ transforms = timm.data.create_transform(**data_config, is_training=False)
46
+
47
+ output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
48
+
49
+ top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
50
+ ```
51
+
52
+ ### Image Embeddings
53
+ ```python
54
+ from urllib.request import urlopen
55
+ from PIL import Image
56
+ import timm
57
+
58
+ img = Image.open(urlopen(
59
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
60
+ ))
61
+
62
+ model = timm.create_model(
63
+ 'vit_tiny_patch16_224.augreg_in21k',
64
+ pretrained=True,
65
+ num_classes=0, # remove classifier nn.Linear
66
+ )
67
+ model = model.eval()
68
+
69
+ # get model specific transforms (normalization, resize)
70
+ data_config = timm.data.resolve_model_data_config(model)
71
+ transforms = timm.data.create_transform(**data_config, is_training=False)
72
+
73
+ output = model(transforms(img).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor
74
+
75
+ # or equivalently (without needing to set num_classes=0)
76
+
77
+ output = model.forward_features(transforms(img).unsqueeze(0))
78
+ # output is unpooled, a (1, 197, 192) shaped tensor
79
 
80
+ output = model.forward_head(output, pre_logits=True)
81
+ # output is a (1, num_features) shaped tensor
82
+ ```
83
 
84
+ ## Model Comparison
85
+ Explore the dataset and runtime metrics of this model in timm [model results](https://github.com/huggingface/pytorch-image-models/tree/main/results).
86
 
87
+ ## Citation
88
+ ```bibtex
89
+ @article{steiner2021augreg,
90
+ title={How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers},
91
+ author={Steiner, Andreas and Kolesnikov, Alexander and and Zhai, Xiaohua and Wightman, Ross and Uszkoreit, Jakob and Beyer, Lucas},
92
+ journal={arXiv preprint arXiv:2106.10270},
93
+ year={2021}
94
+ }
95
+ ```
96
+ ```bibtex
97
+ @article{dosovitskiy2020vit,
98
+ title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
99
+ author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
100
+ journal={ICLR},
101
+ year={2021}
102
+ }
103
+ ```
104
+ ```bibtex
105
+ @misc{rw2019timm,
106
+ author = {Ross Wightman},
107
+ title = {PyTorch Image Models},
108
+ year = {2019},
109
+ publisher = {GitHub},
110
+ journal = {GitHub repository},
111
+ doi = {10.5281/zenodo.4414861},
112
+ howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
113
+ }
114
+ ```