Image Classification
Core ML

FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization

Please observe original license.

Model Details

Evaluation - Variants

Variant Parameters Size (MB) Weight precision Act. precision Δ Pytorch acc
T8 3.6M 7.8 Float16 Float16 -0.9%
MA36 42.7M 84 Float16 Float16 -0.06%

Evaluation - Inference time

Variant Device OS Inference time (ms) Dominant compute unit
T8 iPhone 12 Pro Max 17.5 0.79 Neural Engine
T8 M3 Max 14.4 0.62 Neural Engine
MA36 iPhone 12 Pro Max 18.0 4.50 Neural Engine
MA36 M3 Max 15.0 2.99 Neural Engine

Download

Install huggingface-cli

brew install huggingface-cli

To download one of the .mlpackage folders to the models directory:

huggingface-cli download \
  --local-dir models --local-dir-use-symlinks False \
  apple/coreml-FastViT-T8 

Citation

@inproceedings{vasufastvit2023,
  author = {Pavan Kumar Anasosalu Vasu and James Gabriel and Jeff Zhu and Oncel Tuzel and Anurag Ranjan},
  title = {FastViT:  A Fast Hybrid Vision Transformer using Structural Reparameterization},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year = {2023}
}
Downloads last month
325
Inference Examples
Inference API (serverless) does not yet support coreml models for this pipeline type.

Dataset used to train apple/coreml-FastViT-MA36

Collections including apple/coreml-FastViT-MA36