tlwu commited on
Commit
b15decc
1 Parent(s): c110ffc

update doc

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -14,11 +14,11 @@ tags:
14
  - text-to-image
15
  ---
16
 
17
- # Stable Diffusion XL Turbo for ONNX Runtime
18
 
19
  ## Introduction
20
 
21
- This repository hosts the optimized versions of **SD Turbo** to accelerate inference with ONNX Runtime CUDA execution provider.
22
 
23
  The models are generated by [Olive](https://github.com/microsoft/Olive/tree/main/examples/stable_diffusion) with command like the following:
24
  ```
@@ -34,18 +34,18 @@ See the [usage instructions](#usage-example) for how to run the SDXL pipeline wi
34
  - **License:** [STABILITY AI NON-COMMERCIAL RESEARCH COMMUNITY LICENSE](https://huggingface.co/stabilityai/sd-turbo/blob/main/LICENSE)
35
  - **Model Description:** This is a conversion of the [SD-Turbo](https://huggingface.co/stabilityai/sd-turbo) model for [ONNX Runtime](https://github.com/microsoft/onnxruntime) inference with CUDA execution provider.
36
 
37
- ## Performance Comparison
38
 
39
  #### Latency
40
 
41
  Below is average latency of generating an image of size 512x512 using NVIDIA A100-SXM4-80GB GPU:
42
 
43
- | Engine | Batch Size | Steps | PyTorch 2.1 | ONNX Runtime CUDA |
44
- |-------------|------------|------ | ----------------|-------------------|
45
- | Static | 1 | 1 | 85.3 ms | 38.2 ms |
46
- | Static | 4 | 1 | 213.8 ms | 120.2 ms |
47
- | Static | 1 | 4 | 117.4 ms | 68.7 ms |
48
- | Static | 4 | 4 | 294.3 ms | 192.6 ms |
49
 
50
 
51
  Static means the engine is built for the given batch size and image size combination, and CUDA graph is used to speed up.
 
14
  - text-to-image
15
  ---
16
 
17
+ # Stable Diffusion XL Turbo for ONNX Runtime CUDA
18
 
19
  ## Introduction
20
 
21
+ This repository hosts the optimized ONNX models of **SD Turbo** to accelerate inference with ONNX Runtime CUDA execution provider for Nvidia GPUs. It cannot run in other providers like CPU and DirectML.
22
 
23
  The models are generated by [Olive](https://github.com/microsoft/Olive/tree/main/examples/stable_diffusion) with command like the following:
24
  ```
 
34
  - **License:** [STABILITY AI NON-COMMERCIAL RESEARCH COMMUNITY LICENSE](https://huggingface.co/stabilityai/sd-turbo/blob/main/LICENSE)
35
  - **Model Description:** This is a conversion of the [SD-Turbo](https://huggingface.co/stabilityai/sd-turbo) model for [ONNX Runtime](https://github.com/microsoft/onnxruntime) inference with CUDA execution provider.
36
 
37
+ ## Performance
38
 
39
  #### Latency
40
 
41
  Below is average latency of generating an image of size 512x512 using NVIDIA A100-SXM4-80GB GPU:
42
 
43
+ | Engine | Batch Size | Steps | ONNX Runtime CUDA |
44
+ |-------------|------------|------ | ----------------- |
45
+ | Static | 1 | 1 | 38.2 ms |
46
+ | Static | 4 | 1 | 120.2 ms |
47
+ | Static | 1 | 4 | 68.7 ms |
48
+ | Static | 4 | 4 | 192.6 ms |
49
 
50
 
51
  Static means the engine is built for the given batch size and image size combination, and CUDA graph is used to speed up.