Stable-fast-xl

Stable-fast is an ultra lightweight inference optimization framework for HuggingFace Diffusers on NVIDIA GPUs. stable-fast provides super fast inference optimization by utilizing some key techniques. this repository contains a compact installation of the stable-fast compiler https://github.com/chengzeyi/stable-fast and its inference with the stable-diffusion-xl-base-1.0 Inference with stable-diffusion-xl-base-1.0) and stable-diffusion-xl-1.0-inpainting-0.1

Inference SDXL model 30%+ faster!!!

Differences With Other Acceleration Libraries

Fast:

stable-fast is specialy optimized for HuggingFace Diffusers. It achieves a high performance across many libraries. And it provides a very fast compilation speed within only a few seconds. It is significantly faster than torch.compile, TensorRT and AITemplate in compilation time.

Minimal:

stable-fast works as a plugin framework for PyTorch. It utilizes existing PyTorch functionality and infrastructures and is compatible with other acceleration techniques, as well as popular fine-tuning techniques and deployment solutions.

How to use

Install dependencies

pip install diffusers transformers safetensors accelerate sentencepiece

Download repository and run script for stable-fast installation

git clone https://huggingface.co/artemtumch/stable-fast-xl
cd stable-fast-xl

open install_stable-fast.sh file and change cp311 for your python version in this line

pip install -q https://github.com/chengzeyi/stable-fast/releases/download/v0.0.15/stable_fast-0.0.15+torch210cu118-cp311-cp311-manylinux2014_x86_64.whl

where cp311 -> for python 3.11 | cp38 -> for python3.8

then run script

sh install_stable-fast.sh

Generate image

from diffusers import DiffusionPipeline
import torch

from sfast.compilers.stable_diffusion_pipeline_compiler import (
compile, CompilationConfig
)

import xformers
import triton

pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
use_safetensors=True,
variant="fp16"
)

# enable to reduce GPU VRAM usage (~30%)
# pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesdxl", torch_dtype=torch.float16)

pipe.to("cuda")

# if using torch < 2.0
# pipe.enable_xformers_memory_efficient_attention()

config = CompilationConfig.Default()

config.enable_xformers = True
config.enable_triton = True
config.enable_cuda_graph = True

pipe = compile(pipe, config)

prompt = "An astronaut riding a green horse"

images = pipe(prompt=prompt).images[0]

Inpainting

from diffusers import StableDiffusionXLInpaintPipeline
from diffusers.utils import load_image
import torch

from sfast.compilers.stable_diffusion_pipeline_compiler import (
compile, CompilationConfig
)

import xformers
import triton

pipe = StableDiffusionXLInpaintPipeline.from_pretrained(
"diffusers/stable-diffusion-xl-1.0-inpainting-0.1",
torch_dtype=torch.float16,
variant="fp16"
)

# enable to reduce GPU VRAM usage (~30%)
# pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesdxl", torch_dtype=torch.float16)

pipe.to("cuda")

config = CompilationConfig.Default()

config.enable_xformers = True
config.enable_triton = True
config.enable_cuda_graph = True

pipe = compile(pipe, config)

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

image = load_image(img_url).resize((1024, 1024))
mask_image = load_image(mask_url).resize((1024, 1024))

prompt = "a tiger sitting on a park bench"
generator = torch.Generator(device="cuda").manual_seed(0)

image = pipe(
prompt=prompt,
image=image,
mask_image=mask_image,
guidance_scale=8.0,
num_inference_steps=20, # steps between 15 and 30 work well
strength=0.99, # make sure to use `strength` below 1.0
generator=generator,
).images[0]

artemtumch
/

stable-fast-xl