Diffusers documentation

Conditional image generation

You are viewing v0.15.0 version. A newer version v0.31.0 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Conditional image generation

Conditional image generation allows you to generate images from a text prompt. The text is converted into embeddings which are used to condition the model to generate an image from noise.

The DiffusionPipeline is the easiest way to use a pre-trained diffusion system for inference.

Start by creating an instance of DiffusionPipeline and specify which pipeline checkpoint you would like to download.

In this guide, you’ll use DiffusionPipeline for text-to-image generation with Latent Diffusion:

>>> from diffusers import DiffusionPipeline

>>> generator = DiffusionPipeline.from_pretrained("CompVis/ldm-text2im-large-256")

The DiffusionPipeline downloads and caches all modeling, tokenization, and scheduling components. Because the model consists of roughly 1.4 billion parameters, we strongly recommend running it on a GPU. You can move the generator object to a GPU, just like you would in PyTorch:

>>> generator.to("cuda")

Now you can use the generator on your text prompt:

>>> image = generator("An image of a squirrel in Picasso style").images[0]

The output is by default wrapped into a PIL.Image object.

You can save the image by calling:

>>> image.save("image_of_squirrel_painting.png")

Try out the Spaces below, and feel free to play around with the guidance scale parameter to see how it affects the image quality!