SD Turbo ONNX Q8 Static ARM64

Source model: stabilityai/sd-turbo

Export:

ONNX export from the original Diffusers model in FP32.
Static ONNX Runtime Q8 quantization for ARM64-oriented testing.
Quantized ops: Conv, MatMul, Gemm.
Quantization format: QOperator (QLinearConv, QLinearMatMul).
Activation type: QUInt8.
Weight type: QInt8.
Per-channel weights: enabled.

Known notes:

This is not an OpenVINO INT8 export.
It is intended for ONNX Runtime Android benchmarking.
Calibration was minimal and prompt-oriented; quality should be validated on-device before production use.
Scheduler parity expects Diffusers trailing Euler ancestral timesteps, e.g. 4 steps: [999, 749, 499, 249].

Local smoke test:

.venv-onnx/bin/python onnx/onnx_txt2img_sdturbo_official.py \
  --model-dir onnx/pipeline_runs/sd-turbo-q8-ort/onnx-q8-static-arm64 \
  --prompt "A red sports car on a mountain road at sunrise" \
  --scheduler euler-ancestral \
  --seed 1234 \
  --width 512 \
  --height 512 \
  --steps 4 \
  --latent-rng torch

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support