update doc

a740aad 10 months ago

4.25 kB

	---
	pipeline_tag: text-to-image
	license: other
	license_name: sai-nc-community
	license_link: https://huggingface.co/stabilityai/sd-turbo/blob/main/LICENSE.TXT
	base_model: stabilityai/sd-turbo
	language:
	- en
	tags:
	- stable-diffusion
	- sdxl
	- onnxruntime
	- onnx
	- text-to-image
	---

	# Stable Diffusion Turbo for ONNX Runtime CUDA

	## Introduction

	This repository hosts the optimized ONNX models of SD Turbo to accelerate inference with ONNX Runtime CUDA execution provider for Nvidia GPUs. It cannot run in other providers like CPU and DirectML.

	The models are generated by [Olive](https://github.com/microsoft/Olive/tree/main/examples/stable_diffusion) with command like the following:
	```
	python stable_diffusion.py --provider cuda --model_id stabilityai/sd-turbo --optimize --use_fp16_fixed_vae
	```

	See the [usage instructions](#usage-example) for how to run the SDXL pipeline with the ONNX files hosted in this repository.

	## Model Description

	- Developed by: Stability AI
	- Model type: Diffusion-based text-to-image generative model
	- License: [STABILITY AI NON-COMMERCIAL RESEARCH COMMUNITY LICENSE](https://huggingface.co/stabilityai/sd-turbo/blob/main/LICENSE)
	- Model Description: This is a conversion of the [SD-Turbo](https://huggingface.co/stabilityai/sd-turbo) model for [ONNX Runtime](https://github.com/microsoft/onnxruntime) inference with CUDA execution provider.

	## Performance

	#### Latency

	Below is average latency of generating an image of size 512x512 using NVIDIA A100-SXM4-80GB GPU:

	\| Engine \| Batch Size \| Steps \| ONNX Runtime CUDA \|
	\|-------------\|------------\|------ \| ----------------- \|
	\| Static \| 1 \| 1 \| 38.2 ms \|
	\| Static \| 4 \| 1 \| 120.2 ms \|
	\| Static \| 1 \| 4 \| 68.7 ms \|
	\| Static \| 4 \| 4 \| 192.6 ms \|


	Static means the engine is built for the given batch size and image size combination, and CUDA graph is used to speed up.


	## Usage Example

	Following the [demo instructions](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/README.md#run-demo-with-docker). Example steps:

	0. Install nvidia-docker using these [instructions](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).

	1. Clone onnxruntime repository.
	```shell
	git clone https://github.com/microsoft/onnxruntime
	cd onnxruntime
	```

	2. Download the ONNX files from this repo
	```shell
	git lfs install
	git clone https://huggingface.co/tlwu/sd-turbo-onnxruntime
	```

	3. Launch the docker
	```shell
	docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:23.10-py3 /bin/bash
	```

	4. Build ONNX Runtime from source
	```shell
	export CUDACXX=/usr/local/cuda-12.2/bin/nvcc
	git config --global --add safe.directory '*'
	sh build.sh --config Release --build_shared_lib --parallel --use_cuda --cuda_version 12.2 \
	--cuda_home /usr/local/cuda-12.2 --cudnn_home /usr/lib/x86_64-linux-gnu/ --build_wheel --skip_tests \
	--use_tensorrt --tensorrt_home /usr/src/tensorrt \
	--cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=OFF \
	--cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=80 \
	--allow_running_as_root
	python3 -m pip install build/Linux/Release/dist/onnxruntime_gpu-*-cp310-cp310-linux_x86_64.whl --force-reinstall
	```

	If the GPU is not A100, change CMAKE_CUDA_ARCHITECTURES=80 in the command line according to the GPU compute capacity (like 89 for RTX 4090, or 86 for RTX 3090). If your machine has less than 64GB memory, replace --parallel by --parallel 4 --nvcc_threads 1 to avoid out of memory.

	5. Install libraries and requirements
	```shell
	python3 -m pip install --upgrade pip
	cd /workspace/onnxruntime/python/tools/transformers/models/stable_diffusion
	python3 -m pip install -r requirements-cuda12.txt
	python3 -m pip install --upgrade polygraphy onnx-graphsurgeon --extra-index-url https://pypi.ngc.nvidia.com
	```

	6. Perform ONNX Runtime optimized inference
	```shell
	python3 demo_txt2img.py \
	"starry night over Golden Gate Bridge by van gogh" \
	--version sd-turbo \
	--engine-dir /workspace/sd-turbo-onnxruntime
	```