Habana Gaudi
🤗 Diffusers is compatible with Habana Gaudi through 🤗 Optimum. Follow the installation guide to install the SynapseAI and Gaudi drivers, and then install Optimum Habana:
python -m pip install --upgrade-strategy eager optimum[habana]
To generate images with Stable Diffusion 1 and 2 on Gaudi, you need to instantiate two instances:
GaudiStableDiffusionPipeline
, a pipeline for text-to-image generation.GaudiDDIMScheduler
, a Gaudi-optimized scheduler.
When you initialize the pipeline, you have to specify use_habana=True
to deploy it on HPUs and to get the fastest possible generation, you should enable HPU graphs with use_hpu_graphs=True
.
Finally, specify a GaudiConfig
which can be downloaded from the Habana organization on the Hub.
from optimum.habana import GaudiConfig
from optimum.habana.diffusers import GaudiDDIMScheduler, GaudiStableDiffusionPipeline
model_name = "stabilityai/stable-diffusion-2-base"
scheduler = GaudiDDIMScheduler.from_pretrained(model_name, subfolder="scheduler")
pipeline = GaudiStableDiffusionPipeline.from_pretrained(
model_name,
scheduler=scheduler,
use_habana=True,
use_hpu_graphs=True,
gaudi_config="Habana/stable-diffusion-2",
)
Now you can call the pipeline to generate images by batches from one or several prompts:
outputs = pipeline(
prompt=[
"High quality photo of an astronaut riding a horse in space",
"Face of a yellow cat, high resolution, sitting on a park bench",
],
num_images_per_prompt=10,
batch_size=4,
)
For more information, check out 🤗 Optimum Habana’s documentation and the example provided in the official Github repository.
Benchmark
We benchmarked Habana’s first-generation Gaudi and Gaudi2 with the Habana/stable-diffusion and Habana/stable-diffusion-2 Gaudi configurations (mixed precision bf16/fp32) to demonstrate their performance.
For Stable Diffusion v1.5 on 512x512 images:
Latency (batch size = 1) | Throughput | |
---|---|---|
first-generation Gaudi | 3.80s | 0.308 images/s (batch size = 8) |
Gaudi2 | 1.33s | 1.081 images/s (batch size = 8) |
For Stable Diffusion v2.1 on 768x768 images:
Latency (batch size = 1) | Throughput | |
---|---|---|
first-generation Gaudi | 10.2s | 0.108 images/s (batch size = 4) |
Gaudi2 | 3.17s | 0.379 images/s (batch size = 8) |