pharmapsychotic's picture
Update README.md
ed45558
metadata
license: openrail++
base_model: stabilityai/stable-diffusion-xl-base-1.0
language:
  - en
tags:
  - stable-diffusion
  - stable-diffusion-xl
  - tensorrt
  - text-to-image

Stable Diffusion XL 1.0 TensorRT

Introduction

This repository hosts the TensorRT versions of Stable Diffusion XL 1.0 created in collaboration with NVIDIA. The optimized versions give substantial improvements in speed and efficiency.

examples

Model Description

Performance Comparison

Timings for 30 steps at 1024x1024

Accelerator Baseline (non-optimized) NVIDIA TensorRT (optimized) Percentage improvement
A10 9399 ms 8160 ms ~13%
A100 3704 ms 2742 ms ~26%
H100 2496 ms 1471 ms ~41%

Image throughput for 30 steps at 1024x1024

Accelerator Baseline (non-optimized) NVIDIA TensorRT (optimized) Percentage improvement
A10 0.10 images/sec 0.12 images/sec ~20%
A100 0.27 images/sec 0.36 images/sec ~33%
H100 0.40 images/sec 0.68 images/sec ~70%

Usage Example

  1. Following the setup instructions for TensorRT on launching a TensorRT NGC container.
git clone https://github.com/rajeevsrao/TensorRT.git
cd TensorRT
git checkout release/8.6
docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:23.06-py3 /bin/bash
  1. Download the SDXL TensorRT files from this repo
git lfs install 
git clone https://huggingface.co/stabilityai/stable-diffusion-xl-1.0-tensorrt
cd stable-diffusion-xl-1.0-tensorrt
git lfs pull
cd ..
  1. Install libraries and requirements
python3 -m pip install --upgrade pip
python3 -m pip install --upgrade tensorrt

cd demo/Diffusion
pip3 install -r requirements.txt
  1. Perform TensorRT optimized inference
python3 demo_txt2img_xl.py \
  "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" \
  --build-static-batch \
  --use-cuda-graph \
  --num-warmup-runs 1 \
  --width 1024 \
  --height 1024 \
  --denoising-steps 30 \
  --onnx-base-dir /workspace/stable-diffusion-xl-1.0-tensorrt/sdxl-1.0-base \
  --onnx-refiner-dir /workspace/stable-diffusion-xl-1.0-tensorrt/sdxl-1.0-refiner