FlashStereo — INT8 calibration weights

Pre-built TensorRT INT8 engine and calibration cache for the FlashStereo pipeline.

⚠️ Hardware / software compatibility

The .engine file is a serialized TensorRT binary and is NOT portable across GPU architectures, TensorRT versions, or CUDA versions. It will load only on the exact configuration it was built against:

Requirement Value
GPU Jetson AGX Orin (Ampere SM87)
JetPack 6.0
TensorRT 10.3
CUDA 12.6
Input shape 480 × 640

For any other hardware / version (Thor SM110, RTX 4090/5090, Orin on JetPack 6.1+, etc.), the engine will fail to deserialize. You must rebuild it locally using the FlashStereo repo's build scripts. The .calib file in this release is portable, so you can reuse it to skip the 5-minute INT8 calibration step:

pip install huggingface_hub
huggingface-cli download saofund/flashstereo-int8-orin \
    calib_cache/feature_runner_int8.engine.calib --local-dir .
python scripts/build_int8.py \
    --onnx /path/to/feature_runner.onnx \
    --engine-out artifacts/feature_runner_int8.engine \
    --cache calib_cache/feature_runner_int8.engine.calib \
    --calib-dir assets/calib_pairs

These weights are derived from the FoundationStereo two-stage ONNX export, calibrated on 16 stereo pairs from the public Middlebury 2014 Stereo dataset (the "perfect" subset, resized to 480×640 grayscale).

What's here

engines/
  feature_runner_int8.engine          # 19 MB, hardware-specific (see compatibility table above)
calib_cache/
  feature_runner_int8.engine.calib    # 50 KB TRT entropy cache (portable across hardware/TRT versions)

Hardware target

The engine was built against Jetson AGX Orin (Ampere SM87) with TensorRT 10.3 / CUDA 12.6. TensorRT engines are NOT portable across GPU architectures, driver versions, or TensorRT versions — if your target hardware differs, regenerate using the build scripts in the FlashStereo repo:

python scripts/download_calib_data.py --out-dir assets/calib_pairs --n 16
python scripts/build_int8.py \
    --onnx /path/to/feature_runner.onnx \
    --engine-out artifacts/feature_runner_int8.engine \
    --calib-dir assets/calib_pairs

What's NOT here

post_runner_int8.engine is intentionally omitted from this release — building it takes ~45 minutes of GPU time and produces a 14 MB engine. Users on the target hardware can build it themselves in one command once they have the FP16 engine:

python scripts/gen_post_calib_data.py \
    --feat-engine /path/to/feature_runner.engine \
    --post-engine /path/to/post_runner.engine \
    --calib-dir assets/calib_pairs \
    --out-dir artifacts/post_calib

python scripts/build_int8_post.py \
    --onnx /path/to/post_runner.onnx \
    --engine-out artifacts/post_runner_int8.engine \
    --npz-dir artifacts/post_calib

The FlashStereo FP16 path alone yields 1.92× speed-up over the stock pipeline with bit-identical output — see the repo README for details.

Provenance

Item Value
Calibration data Middlebury 2014 "perfect" subset (16 scenes, resized to 480×640 grayscale)
Calibration algorithm TensorRT IInt8EntropyCalibrator2
Source ONNX FoundationStereo two-stage feature_runner.onnx (480×640 dynamic batch)
Target Jetson AGX Orin SM87, TensorRT 10.3, CUDA 12.6
Disparity quality vs. FP16 cosine 0.999998, mean L1 0.04 px, rel L1 0.06%

License

Apache 2.0, matching the FlashStereo source repo. The original model weights and ONNX are governed by the FoundationStereo license.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support