FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization

Model Details

Model Type: Image classification
Model Stats:
- Params (M): 44.1
- GMACs: 7.8
- Activations (M): 40.4
- Image size: 256 x 256
Papers:
- FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization: https://arxiv.org/abs/2303.14189
Original: https://github.com/apple/ml-fastvit
Dataset: ImageNet-1k

Evaluation - Variants

Variant	Parameters	Size (MB)	Weight precision	Act. precision	Δ Pytorch acc
T8	3.6M	7.8	Float16	Float16	-0.9%
MA36	42.7M	84	Float16	Float16	-0.06%

Evaluation - Inference time

Variant	Device	OS	Inference time (ms)	Dominant compute unit
T8	iPhone 12 Pro Max	17.5	0.79	Neural Engine
T8	M3 Max	14.4	0.62	Neural Engine
MA36	iPhone 12 Pro Max	18.0	4.50	Neural Engine
MA36	M3 Max	15.0	2.99	Neural Engine

Download

Install huggingface-cli

brew install huggingface-cli

To download one of the .mlpackage folders to the models directory:

huggingface-cli download \
  --local-dir models --local-dir-use-symlinks False \
  apple/coreml-FastViT-T8

Citation

@inproceedings{vasufastvit2023,
  author = {Pavan Kumar Anasosalu Vasu and James Gabriel and Jeff Zhu and Oncel Tuzel and Anurag Ranjan},
  title = {FastViT:  A Fast Hybrid Vision Transformer using Structural Reparameterization},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year = {2023}
}

apple
/

coreml-FastViT-MA36

FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization

Model Details

Evaluation - Variants

Evaluation - Inference time

Download

Citation

Dataset used to train apple/coreml-FastViT-MA36

Collections including apple/coreml-FastViT-MA36

Core ML Gallery Models

Core ML FastViT