Image Classification
Core ML

FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization

Please observe original license.

Model Details

Evaluation - Variants

Variant Parameters Size (MB) Weight precision Act. precision Δ Pytorch acc
T8 3.6M 7.8 Float16 Float16 -0.9%
MA36 42.7M 84 Float16 Float16 -0.06%

Evaluation - Inference time

Variant Device OS Inference time (ms) Dominant compute unit
T8 iPhone 12 Pro Max 17.5 0.79 Neural Engine
T8 M3 Max 14.4 0.62 Neural Engine
MA36 iPhone 12 Pro Max 18.0 4.50 Neural Engine
MA36 M3 Max 15.0 2.99 Neural Engine

Download

Install huggingface-cli

brew install huggingface-cli

To download one of the .mlpackage folders to the models directory:

huggingface-cli download \
  --local-dir models --local-dir-use-symlinks False \
  apple/coreml-FastViT-T8 

Integrate in Swift apps

The huggingface/coreml-examples repository contains sample Swift code for coreml-FastViT-T8 and other models. See the instructions there to build the demo app, which shows how to use the model in your own Swift apps.

Citation

@inproceedings{vasufastvit2023,
  author = {Pavan Kumar Anasosalu Vasu and James Gabriel and Jeff Zhu and Oncel Tuzel and Anurag Ranjan},
  title = {FastViT:  A Fast Hybrid Vision Transformer using Structural Reparameterization},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year = {2023}
}
Downloads last month
12
Inference Examples
Inference API (serverless) does not yet support coreml models for this pipeline type.

Dataset used to train apple/coreml-FastViT-T8

Collections including apple/coreml-FastViT-T8