ahatamiz commited on
Commit
81896da
1 Parent(s): 0a120eb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +286 -0
README.md CHANGED
@@ -1,3 +1,289 @@
1
  ---
2
  license: other
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
3
+ datasets:
4
+ - imagenet-1k
5
  ---
6
+ [**FasterViT: Fast Vision Transformers with Hierarchical Attention**](https://arxiv.org/abs/2306.06189).
7
+
8
+
9
+ FasterViT achieves a new SOTA Pareto-front in
10
+ terms of accuracy vs. image throughput without extra training data !
11
+
12
+ <p align="center">
13
+ <img src="https://github.com/NVlabs/FasterViT/assets/26806394/253d1a2e-b5f5-4a9b-a362-6cdd16bfccc1" width=62% height=62%
14
+ class="center">
15
+ </p>
16
+
17
+ Note: Please use the [**latest NVIDIA TensorRT release**](https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/index.html) to enjoy the benefits of optimized FasterViT ops.
18
+
19
+
20
+ ## Quick Start
21
+
22
+ We can import pre-trained FasterViT models with **1 line of code**. First, FasterViT can be simply installed by:
23
+
24
+ ```bash
25
+ pip install fastervit
26
+ ```
27
+
28
+ A pretrained FasterViT model with default hyper-parameters can be created as in the following:
29
+
30
+ ```python
31
+ >>> from fastervit import create_model
32
+
33
+ # Define fastervit-0 model with 224 x 224 resolution
34
+
35
+ >>> model = create_model('faster_vit_0_224',
36
+ pretrained=True,
37
+ model_path="/tmp/faster_vit_0.pth.tar")
38
+ ```
39
+
40
+ `model_path` is used to set the directory to download the model.
41
+
42
+ We can also simply test the model by passing a dummy input image. The output is the logits:
43
+
44
+ ```python
45
+ >>> import torch
46
+
47
+ >>> image = torch.rand(1, 3, 224, 224)
48
+ >>> output = model(image) # torch.Size([1, 1000])
49
+ ```
50
+
51
+ We can also use the any-resolution FasterViT model to accommodate arbitrary image resolutions. In the following, we define an any-resolution FasterViT-0
52
+ model with input resolution of 576 x 960, window sizes of 12 and 6 in 3rd and 4th stages, carrier token size of 2 and embedding dimension of
53
+ 64:
54
+
55
+ ```python
56
+ >>> from fastervit import create_model
57
+
58
+ # Define any-resolution FasterViT-0 model with 576 x 960 resolution
59
+ >>> model = create_model('faster_vit_0_any_res',
60
+ resolution=[576, 960],
61
+ window_size=[7, 7, 12, 6],
62
+ ct_size=2,
63
+ dim=64,
64
+ pretrained=True)
65
+ ```
66
+ Note that the above model is intiliazed from the original ImageNet pre-trained FasterViT with original resolution of 224 x 224. As a result, missing keys and mis-matches could be expected since we are addign new layers (e.g. addition of new carrier tokens, etc.)
67
+
68
+ We can simply test the model by passing a dummy input image. The output is the logits:
69
+
70
+ ```python
71
+ >>> import torch
72
+
73
+ >>> image = torch.rand(1, 3, 576, 960)
74
+ >>> output = model(image) # torch.Size([1, 1000])
75
+ ```
76
+
77
+ ---
78
+
79
+ ## Results + Pretrained Models
80
+
81
+ ### ImageNet-1K
82
+ **FasterViT ImageNet-1K Pretrained Models**
83
+
84
+ <table>
85
+ <tr>
86
+ <th>Name</th>
87
+ <th>Acc@1(%)</th>
88
+ <th>Acc@5(%)</th>
89
+ <th>Throughput(Img/Sec)</th>
90
+ <th>Resolution</th>
91
+ <th>#Params(M)</th>
92
+ <th>FLOPs(G)</th>
93
+ <th>Download</th>
94
+ </tr>
95
+
96
+ <tr>
97
+ <td>FasterViT-0</td>
98
+ <td>82.1</td>
99
+ <td>95.9</td>
100
+ <td>5802</td>
101
+ <td>224x224</td>
102
+ <td>31.4</td>
103
+ <td>3.3</td>
104
+ <td><a href="https://drive.google.com/uc?export=download&id=1twI2LFJs391Yrj8MR4Ui9PfrvWqjE1iB">model</a></td>
105
+ </tr>
106
+
107
+ <tr>
108
+ <td>FasterViT-1</td>
109
+ <td>83.2</td>
110
+ <td>96.5</td>
111
+ <td>4188</td>
112
+ <td>224x224</td>
113
+ <td>53.4</td>
114
+ <td>5.3</td>
115
+ <td><a href="https://drive.google.com/uc?export=download&id=1r7W10n5-bFtM3sz4bmaLrowN2gYPkLGT">model</a></td>
116
+ </tr>
117
+
118
+ <tr>
119
+ <td>FasterViT-2</td>
120
+ <td>84.2</td>
121
+ <td>96.8</td>
122
+ <td>3161</td>
123
+ <td>224x224</td>
124
+ <td>75.9</td>
125
+ <td>8.7</td>
126
+ <td><a href="https://drive.google.com/uc?export=download&id=1n_a6s0pgi0jVZOGmDei2vXHU5E6RH5wU">model</a></td>
127
+ </tr>
128
+
129
+ <tr>
130
+ <td>FasterViT-3</td>
131
+ <td>84.9</td>
132
+ <td>97.2</td>
133
+ <td>1780</td>
134
+ <td>224x224</td>
135
+ <td>159.5</td>
136
+ <td>18.2</td>
137
+ <td><a href="https://drive.google.com/uc?export=download&id=1tvWElZ91Sia2SsXYXFMNYQwfipCxtI7X">model</a></td>
138
+ </tr>
139
+
140
+ <tr>
141
+ <td>FasterViT-4</td>
142
+ <td>85.4</td>
143
+ <td>97.3</td>
144
+ <td>849</td>
145
+ <td>224x224</td>
146
+ <td>424.6</td>
147
+ <td>36.6</td>
148
+ <td><a href="https://drive.google.com/uc?export=download&id=1gYhXA32Q-_9C5DXel17avV_ZLoaHwdgz">model</a></td>
149
+ </tr>
150
+
151
+ <tr>
152
+ <td>FasterViT-5</td>
153
+ <td>85.6</td>
154
+ <td>97.4</td>
155
+ <td>449</td>
156
+ <td>224x224</td>
157
+ <td>975.5</td>
158
+ <td>113.0</td>
159
+ <td><a href="https://drive.google.com/uc?export=download&id=1mqpai7XiHLr_n1tjxjzT8q369xTCq_z-">model</a></td>
160
+ </tr>
161
+
162
+ <tr>
163
+ <td>FasterViT-6</td>
164
+ <td>85.8</td>
165
+ <td>97.4</td>
166
+ <td>352</td>
167
+ <td>224x224</td>
168
+ <td>1360.0</td>
169
+ <td>142.0</td>
170
+ <td><a href="https://drive.google.com/uc?export=download&id=12jtavR2QxmMzcKwPzWe7kw-oy34IYi59">model</a></td>
171
+ </tr>
172
+
173
+ </table>
174
+
175
+
176
+ ### Robustness (ImageNet-A - ImageNet-R - ImageNet-V2)
177
+
178
+ All models use `crop_pct=0.875`. Results are obtained by running inference on ImageNet-1K pretrained models without finetuning.
179
+ <table>
180
+ <tr>
181
+ <th>Name</th>
182
+ <th>A-Acc@1(%)</th>
183
+ <th>A-Acc@5(%)</th>
184
+ <th>R-Acc@1(%)</th>
185
+ <th>R-Acc@5(%)</th>
186
+ <th>V2-Acc@1(%)</th>
187
+ <th>V2-Acc@5(%)</th>
188
+ </tr>
189
+
190
+ <tr>
191
+ <td>FasterViT-0</td>
192
+ <td>23.9</td>
193
+ <td>57.6</td>
194
+ <td>45.9</td>
195
+ <td>60.4</td>
196
+ <td>70.9</td>
197
+ <td>90.0</td>
198
+ </tr>
199
+
200
+ <tr>
201
+ <td>FasterViT-1</td>
202
+ <td>31.2</td>
203
+ <td>63.3</td>
204
+ <td>47.5</td>
205
+ <td>61.9</td>
206
+ <td>72.6</td>
207
+ <td>91.0</td>
208
+ </tr>
209
+
210
+ <tr>
211
+ <td>FasterViT-2</td>
212
+ <td>38.2</td>
213
+ <td>68.9</td>
214
+ <td>49.6</td>
215
+ <td>63.4</td>
216
+ <td>73.7</td>
217
+ <td>91.6</td>
218
+ </tr>
219
+
220
+ <tr>
221
+ <td>FasterViT-3</td>
222
+ <td>44.2</td>
223
+ <td>73.0</td>
224
+ <td>51.9</td>
225
+ <td>65.6</td>
226
+ <td>75.0</td>
227
+ <td>92.2</td>
228
+ </tr>
229
+
230
+ <tr>
231
+ <td>FasterViT-4</td>
232
+ <td>49.0</td>
233
+ <td>75.4</td>
234
+ <td>56.0</td>
235
+ <td>69.6</td>
236
+ <td>75.7</td>
237
+ <td>92.7</td>
238
+ </tr>
239
+
240
+ <tr>
241
+ <td>FasterViT-5</td>
242
+ <td>52.7</td>
243
+ <td>77.6</td>
244
+ <td>56.9</td>
245
+ <td>70.0</td>
246
+ <td>76.0</td>
247
+ <td>93.0</td>
248
+ </tr>
249
+
250
+ <tr>
251
+ <td>FasterViT-6</td>
252
+ <td>53.7</td>
253
+ <td>78.4</td>
254
+ <td>57.1</td>
255
+ <td>70.1</td>
256
+ <td>76.1</td>
257
+ <td>93.0</td>
258
+ </tr>
259
+
260
+ </table>
261
+
262
+ A, R and V2 denote ImageNet-A, ImageNet-R and ImageNet-V2 respectively.
263
+
264
+ ## Citation
265
+
266
+ Please consider citing FasterViT if this repository is useful for your work.
267
+
268
+ ```
269
+ @article{hatamizadeh2023fastervit,
270
+ title={FasterViT: Fast Vision Transformers with Hierarchical Attention},
271
+ author={Hatamizadeh, Ali and Heinrich, Greg and Yin, Hongxu and Tao, Andrew and Alvarez, Jose M and Kautz, Jan and Molchanov, Pavlo},
272
+ journal={arXiv preprint arXiv:2306.06189},
273
+ year={2023}
274
+ }
275
+ ```
276
+
277
+
278
+ ## Licenses
279
+
280
+ Copyright © 2023, NVIDIA Corporation. All rights reserved.
281
+
282
+ This work is made available under the NVIDIA Source Code License-NC. Click [here](LICENSE) to view a copy of this license.
283
+
284
+ For license information regarding the timm repository, please refer to its [repository](https://github.com/rwightman/pytorch-image-models).
285
+
286
+ For license information regarding the ImageNet dataset, please see the [ImageNet official website](https://www.image-net.org/).
287
+
288
+ ## Acknowledgement
289
+ This repository is built on top of the [timm](https://github.com/huggingface/pytorch-image-models) repository. We thank [Ross Wrightman](https://rwightman.com/) for creating and maintaining this high-quality library.