Apple FastVLM Vision Encoder (1024)
ONNX vision encoders for FastVLM image preprocessing and embedding extraction, provided in three precision variants so you can trade off accuracy, speed, and file size for your deployment.
Model files
| File | Precision | Size | Notes |
|---|---|---|---|
vision_encoder_fp32.onnx |
FP32 | 508 MB | Full precision, highest numerical fidelity |
vision_encoder_safe_fp16.onnx |
FP16 | 255 MB | Good balance of accuracy and size |
vision_encoder_int8.onnx |
INT8 | 129 MB | Smallest and fastest, best for constrained CPU deployment |
Usage
Use whichever variant best fits your accuracy/performance needs for image embedding generation in FastVLM-style CPU pipelines.
import onnxruntime as ort
# Swap in whichever variant you need
session = ort.InferenceSession("vision_encoder_fp32.onnx")
# feed in your preprocessed image tensor and run inference
Choosing a variant
- FP32 โ use when accuracy matters most and memory/size isn't a constraint.
- FP16 โ a middle ground: smaller and often faster than FP32 with minimal accuracy loss on supporting hardware.
- INT8 โ use for the smallest footprint and fastest CPU inference, with some tradeoff in numerical precision.
Base model
Derived from apple/FastVLM-0.5B.
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for musk12/apple-fastvlm-vision-encoder-1024
Base model
apple/FastVLM-0.5B