Instructions to use saofund/flashstereo-int8-orin with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- TensorRT
How to use saofund/flashstereo-int8-orin with TensorRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
FlashStereo — INT8 calibration weights
Pre-built TensorRT INT8 engine and calibration cache for the FlashStereo pipeline.
⚠️ Hardware / software compatibility
The
.enginefile is a serialized TensorRT binary and is NOT portable across GPU architectures, TensorRT versions, or CUDA versions. It will load only on the exact configuration it was built against:
Requirement Value GPU Jetson AGX Orin (Ampere SM87) JetPack 6.0 TensorRT 10.3 CUDA 12.6 Input shape 480 × 640 For any other hardware / version (Thor SM110, RTX 4090/5090, Orin on JetPack 6.1+, etc.), the engine will fail to deserialize. You must rebuild it locally using the FlashStereo repo's build scripts. The
.calibfile in this release is portable, so you can reuse it to skip the 5-minute INT8 calibration step:pip install huggingface_hub huggingface-cli download saofund/flashstereo-int8-orin \ calib_cache/feature_runner_int8.engine.calib --local-dir . python scripts/build_int8.py \ --onnx /path/to/feature_runner.onnx \ --engine-out artifacts/feature_runner_int8.engine \ --cache calib_cache/feature_runner_int8.engine.calib \ --calib-dir assets/calib_pairs
These weights are derived from the FoundationStereo two-stage ONNX export, calibrated on 16 stereo pairs from the public Middlebury 2014 Stereo dataset (the "perfect" subset, resized to 480×640 grayscale).
What's here
engines/
feature_runner_int8.engine # 19 MB, hardware-specific (see compatibility table above)
calib_cache/
feature_runner_int8.engine.calib # 50 KB TRT entropy cache (portable across hardware/TRT versions)
Hardware target
The engine was built against Jetson AGX Orin (Ampere SM87) with TensorRT 10.3 / CUDA 12.6. TensorRT engines are NOT portable across GPU architectures, driver versions, or TensorRT versions — if your target hardware differs, regenerate using the build scripts in the FlashStereo repo:
python scripts/download_calib_data.py --out-dir assets/calib_pairs --n 16
python scripts/build_int8.py \
--onnx /path/to/feature_runner.onnx \
--engine-out artifacts/feature_runner_int8.engine \
--calib-dir assets/calib_pairs
What's NOT here
post_runner_int8.engine is intentionally omitted from this release —
building it takes ~45 minutes of GPU time and produces a 14 MB engine.
Users on the target hardware can build it themselves in one command
once they have the FP16 engine:
python scripts/gen_post_calib_data.py \
--feat-engine /path/to/feature_runner.engine \
--post-engine /path/to/post_runner.engine \
--calib-dir assets/calib_pairs \
--out-dir artifacts/post_calib
python scripts/build_int8_post.py \
--onnx /path/to/post_runner.onnx \
--engine-out artifacts/post_runner_int8.engine \
--npz-dir artifacts/post_calib
The FlashStereo FP16 path alone yields 1.92× speed-up over the stock pipeline with bit-identical output — see the repo README for details.
Provenance
| Item | Value |
|---|---|
| Calibration data | Middlebury 2014 "perfect" subset (16 scenes, resized to 480×640 grayscale) |
| Calibration algorithm | TensorRT IInt8EntropyCalibrator2 |
| Source ONNX | FoundationStereo two-stage feature_runner.onnx (480×640 dynamic batch) |
| Target | Jetson AGX Orin SM87, TensorRT 10.3, CUDA 12.6 |
| Disparity quality vs. FP16 | cosine 0.999998, mean L1 0.04 px, rel L1 0.06% |
License
Apache 2.0, matching the FlashStereo source repo. The original model weights and ONNX are governed by the FoundationStereo license.
- Downloads last month
- -