Apple FastVLM Vision Encoder (1024)

ONNX vision encoders for FastVLM image preprocessing and embedding extraction, provided in three precision variants so you can trade off accuracy, speed, and file size for your deployment.

Model files

File Precision Size Notes
vision_encoder_fp32.onnx FP32 508 MB Full precision, highest numerical fidelity
vision_encoder_safe_fp16.onnx FP16 255 MB Good balance of accuracy and size
vision_encoder_int8.onnx INT8 129 MB Smallest and fastest, best for constrained CPU deployment

Usage

Use whichever variant best fits your accuracy/performance needs for image embedding generation in FastVLM-style CPU pipelines.

import onnxruntime as ort

# Swap in whichever variant you need
session = ort.InferenceSession("vision_encoder_fp32.onnx")
# feed in your preprocessed image tensor and run inference

Choosing a variant

  • FP32 โ€” use when accuracy matters most and memory/size isn't a constraint.
  • FP16 โ€” a middle ground: smaller and often faster than FP32 with minimal accuracy loss on supporting hardware.
  • INT8 โ€” use for the smallest footprint and fastest CPU inference, with some tradeoff in numerical precision.

Base model

Derived from apple/FastVLM-0.5B.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for musk12/apple-fastvlm-vision-encoder-1024

Quantized
(6)
this model