YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

SD Turbo ONNX Q8 Static ARM64

Source model: stabilityai/sd-turbo

Export:

  • ONNX export from the original Diffusers model in FP32.
  • Static ONNX Runtime Q8 quantization for ARM64-oriented testing.
  • Quantized ops: Conv, MatMul, Gemm.
  • Quantization format: QOperator (QLinearConv, QLinearMatMul).
  • Activation type: QUInt8.
  • Weight type: QInt8.
  • Per-channel weights: enabled.

Known notes:

  • This is not an OpenVINO INT8 export.
  • It is intended for ONNX Runtime Android benchmarking.
  • Calibration was minimal and prompt-oriented; quality should be validated on-device before production use.
  • Scheduler parity expects Diffusers trailing Euler ancestral timesteps, e.g. 4 steps: [999, 749, 499, 249].

Local smoke test:

.venv-onnx/bin/python onnx/onnx_txt2img_sdturbo_official.py \
  --model-dir onnx/pipeline_runs/sd-turbo-q8-ort/onnx-q8-static-arm64 \
  --prompt "A red sports car on a mountain road at sunrise" \
  --scheduler euler-ancestral \
  --seed 1234 \
  --width 512 \
  --height 512 \
  --steps 4 \
  --latent-rng torch
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support