tlwu
/

sd-turbo-onnxruntime

stable-diffusion

Model card Files Files and versions Community

tlwu commited on Dec 31, 2023

Commit

b15decc

•

1 Parent(s): c110ffc

update doc

Files changed (1) hide show

README.md +9 -9

README.md CHANGED Viewed

@@ -14,11 +14,11 @@ tags:
   - text-to-image
 ---
-# Stable Diffusion XL Turbo for ONNX Runtime
 ## Introduction
-This repository hosts the optimized versions of **SD Turbo** to accelerate inference with ONNX Runtime CUDA execution provider.
 The models are generated by [Olive](https://github.com/microsoft/Olive/tree/main/examples/stable_diffusion) with command like the following:
 ```
@@ -34,18 +34,18 @@ See the [usage instructions](#usage-example) for how to run the SDXL pipeline wi
 - **License:** [STABILITY AI NON-COMMERCIAL RESEARCH COMMUNITY LICENSE](https://huggingface.co/stabilityai/sd-turbo/blob/main/LICENSE)
 - **Model Description:** This is a conversion of the [SD-Turbo](https://huggingface.co/stabilityai/sd-turbo) model for [ONNX Runtime](https://github.com/microsoft/onnxruntime) inference with CUDA execution provider.
-## Performance Comparison
 #### Latency
 Below is average latency of generating an image of size 512x512 using NVIDIA A100-SXM4-80GB GPU:
-| Engine      | Batch Size | Steps | PyTorch 2.1     | ONNX Runtime CUDA |
-|-------------|------------|------ | ----------------|-------------------|
-| Static      | 1          |   1   | 85.3 ms         |  38.2 ms          |
-| Static      | 4          |   1   | 213.8 ms        | 120.2 ms          |
-| Static      | 1          |   4   | 117.4 ms        |  68.7 ms          |
-| Static      | 4          |   4   | 294.3 ms        | 192.6 ms          |
 Static means the engine is built for the given batch size and image size combination, and CUDA graph is used to speed up.

   - text-to-image
 ---
+# Stable Diffusion XL Turbo for ONNX Runtime CUDA
 ## Introduction
+This repository hosts the optimized ONNX models of **SD Turbo** to accelerate inference with ONNX Runtime CUDA execution provider for Nvidia GPUs. It cannot run in other providers like CPU and DirectML.
 The models are generated by [Olive](https://github.com/microsoft/Olive/tree/main/examples/stable_diffusion) with command like the following:
 ```
 - **License:** [STABILITY AI NON-COMMERCIAL RESEARCH COMMUNITY LICENSE](https://huggingface.co/stabilityai/sd-turbo/blob/main/LICENSE)
 - **Model Description:** This is a conversion of the [SD-Turbo](https://huggingface.co/stabilityai/sd-turbo) model for [ONNX Runtime](https://github.com/microsoft/onnxruntime) inference with CUDA execution provider.
+## Performance
 #### Latency
 Below is average latency of generating an image of size 512x512 using NVIDIA A100-SXM4-80GB GPU:
+| Engine      | Batch Size | Steps | ONNX Runtime CUDA |
+|-------------|------------|------ | ----------------- |
+| Static      | 1          |   1   | 38.2 ms           |
+| Static      | 4          |   1   | 120.2 ms          |
+| Static      | 1          |   4   | 68.7 ms           |
+| Static      | 4          |   4   | 192.6 ms          |
 Static means the engine is built for the given batch size and image size combination, and CUDA graph is used to speed up.