sd-turbo-onnxruntime / README.md

tlwu

update doc

854c91c 10 months ago

preview code

raw

history blame

No virus

4.04 kB

	---
	pipeline_tag: text-to-image
	license: other
	license_name: sai-nc-community
	license_link: https://huggingface.co/stabilityai/sd-turbo/blob/main/LICENSE.TXT
	base_model: stabilityai/sd-turbo
	language:
	- en
	tags:
	- stable-diffusion
	- sdxl
	- onnxruntime
	- onnx
	- text-to-image
	---

	# Stable Diffusion XL Turbo for ONNX Runtime

	## Introduction

	This repository hosts the optimized versions of SD Turbo to accelerate inference with ONNX Runtime CUDA execution provider.

	See the [usage instructions](#usage-example) for how to run the SDXL pipeline with the ONNX files hosted in this repository.

	## Model Description

	- Developed by: Stability AI
	- Model type: Diffusion-based text-to-image generative model
	- License: [STABILITY AI NON-COMMERCIAL RESEARCH COMMUNITY LICENSE](https://huggingface.co/stabilityai/sd-turbo/blob/main/LICENSE)
	- Model Description: This is a conversion of the [SD-Turbo](https://huggingface.co/stabilityai/sd-turbo) model for [ONNX Runtime](https://github.com/microsoft/onnxruntime) inference with CUDA execution provider.

	## Performance Comparison

	#### Latency

	Below is average latency of generating an image of size 512x512 using NVIDIA A100-SXM4-80GB GPU:

	\| Engine \| Batch Size \| Steps \| PyTorch 2.1 \| ONNX Runtime CUDA \|
	\|-------------\|------------\|------ \| ----------------\|-------------------\|
	\| Static \| 1 \| 1 \| 85.3 ms \| 32.9 ms \|
	\| Static \| 4 \| 1 \| 213.8 ms \| 97.5 ms \|
	\| Static \| 1 \| 4 \| 117.4 ms \| 62.5 ms \|
	\| Static \| 4 \| 4 \| 294.3 ms \| 168.3 ms \|


	Static means the engine is built for the given batch size and image size combination, and CUDA graph is used to speed up.


	## Usage Example

	Following the [demo instructions](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/README.md#run-demo-with-docker). Example steps:

	0. Install nvidia-docker using these [instructions](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).

	1. Clone onnxruntime repository.
	```shell
	git clone https://github.com/microsoft/onnxruntime
	cd onnxruntime
	```

	2. Download the SDXL ONNX files from this repo
	```shell
	git lfs install
	git clone https://huggingface.co/tlwu/sdxl-turbo-onnxruntime
	```

	3. Launch the docker
	```shell
	docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:23.10-py3 /bin/bash
	```

	4. Build ONNX Runtime from source
	```shell
	export CUDACXX=/usr/local/cuda-12.2/bin/nvcc
	git config --global --add safe.directory '*'
	sh build.sh --config Release --build_shared_lib --parallel --use_cuda --cuda_version 12.2 \
	--cuda_home /usr/local/cuda-12.2 --cudnn_home /usr/lib/x86_64-linux-gnu/ --build_wheel --skip_tests \
	--use_tensorrt --tensorrt_home /usr/src/tensorrt \
	--cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=OFF \
	--cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=80 \
	--allow_running_as_root
	python3 -m pip install build/Linux/Release/dist/onnxruntime_gpu-*-cp310-cp310-linux_x86_64.whl --force-reinstall
	```

	If the GPU is not A100, change CMAKE_CUDA_ARCHITECTURES=80 in the command line according to the GPU compute capacity (like 89 for RTX 4090, or 86 for RTX 3090). If your machine has less than 64GB memory, replace --parallel by --parallel 4 --nvcc_threads 1 to avoid out of memory.

	5. Install libraries and requirements
	```shell
	python3 -m pip install --upgrade pip
	cd /workspace/onnxruntime/python/tools/transformers/models/stable_diffusion
	python3 -m pip install -r requirements-cuda12.txt
	python3 -m pip install --upgrade polygraphy onnx-graphsurgeon --extra-index-url https://pypi.ngc.nvidia.com
	```

	6. Perform ONNX Runtime optimized inference
	```shell
	python3 demo_txt2img.py \
	"starry night over Golden Gate Bridge by van gogh" \
	--version sd-turbo \
	--work-dir /workspace/sd-turbo-onnxruntime
	```