File size: 2,243 Bytes
59a0903
 
 
f7434a2
59a0903
09e81d2
 
59a0903
 
 
f7434a2
59a0903
 
 
 
 
 
 
 
 
 
 
 
 
f7434a2
 
2281a39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f7434a2
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
tags:
- image-classification
library_name: coreml
license: other
license_name: apple-ascl
license_link: LICENSE
datasets:
- imagenet-1k
---
# FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization

Please observe [original license](https://github.com/apple/ml-fastvit/blob/8af5928238cab99c45f64fc3e4e7b1516b8224ba/LICENSE).

## Model Details
- **Model Type:** Image classification / feature backbone
- **Model Stats:**
  - Params (M): 4.0
  - GMACs: 0.7
  - Activations (M): 8.6
  - Image size: 256 x 256
- **Papers:**
  - FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization: https://arxiv.org/abs/2303.14189
- **Original:** https://github.com/apple/ml-fastvit
- **Dataset:** ImageNet-1k

## Evaluation - Variants

| Variant                                                 | Parameters | Size (MB) | Weight precision | Act. precision | Δ Pytorch acc |
| ------------------------------------------------------- | ---------: | --------: | ---------------- | -------------- | ------------- |
| T8                                                      |      3.6M  |       7.8 | Float16          | Float16        |  -0.9%        |
| MA36                                                    |      42.7M |        84 | Float16          | Float16        | -0.06%        |



## Evaluaition - Inference time

| Variant | Device               | OS   | Inference time (ms) | Dominant compute unit |
| ------- | -------------------- | ---- | ------------------: | --------------------- |
|    T8   | iPhone 12 Pro Max    | 17.5 |                0.79 | Neural Engine         |
|    T8   | M3 Max               | 14.4 |                0.62 | Neural Engine         |
|   MA36  | iPhone 12 Pro Max    | 18.0 |                4.50 | Neural Engine         |
|   MA36  | M3 Max               | 15.0 |                2.99 | Neural Engine         |

## Citation
```bibtex
@inproceedings{vasufastvit2023,
  author = {Pavan Kumar Anasosalu Vasu and James Gabriel and Jeff Zhu and Oncel Tuzel and Anurag Ranjan},
  title = {FastViT:  A Fast Hybrid Vision Transformer using Structural Reparameterization},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year = {2023}
}
```