Diffusers

Conditional image generation

Conditional image generation allows you to generate images from a text prompt. The text is converted into embeddings which are used to condition the model to generate an image from noise.

The DiffusionPipeline is the easiest way to use a pre-trained diffusion system for inference.

Start by creating an instance of DiffusionPipeline and specify which pipeline checkpoint you would like to download.

In this guide, you’ll use DiffusionPipeline for text-to-image generation with Latent Diffusion:

>>> from diffusers import DiffusionPipeline

>>> generator = DiffusionPipeline.from_pretrained("CompVis/ldm-text2im-large-256")

The DiffusionPipeline downloads and caches all modeling, tokenization, and scheduling components. Because the model consists of roughly 1.4 billion parameters, we strongly recommend running it on a GPU. You can move the generator object to a GPU, just like you would in PyTorch:

>>> generator.to("cuda")

Now you can use the generator on your text prompt:

>>> image = generator("An image of a squirrel in Picasso style").images[0]

The output is by default wrapped into a PIL.Image object.

You can save the image by calling:

>>> image.save("image_of_squirrel_painting.png")

Try out the Spaces below, and feel free to play around with the guidance scale parameter to see how it affects the image quality!