Conditional image generation
Conditional image generation allows you to generate images from a text prompt. The text is converted into embeddings which are used to condition the model to generate an image from noise.
The DiffusionPipeline is the easiest way to use a pre-trained diffusion system for inference.
Start by creating an instance of DiffusionPipeline and specify which pipeline checkpoint you would like to download.
In this guide, you’ll use DiffusionPipeline for text-to-image generation with Latent Diffusion:
>>> from diffusers import DiffusionPipeline
>>> generator = DiffusionPipeline.from_pretrained("CompVis/ldm-text2im-large-256")
The DiffusionPipeline downloads and caches all modeling, tokenization, and scheduling components. Because the model consists of roughly 1.4 billion parameters, we strongly recommend running it on a GPU. You can move the generator object to a GPU, just like you would in PyTorch:
>>> generator.to("cuda")
Now you can use the generator
on your text prompt:
>>> image = generator("An image of a squirrel in Picasso style").images[0]
The output is by default wrapped into a PIL.Image
object.
You can save the image by calling:
>>> image.save("image_of_squirrel_painting.png")
Try out the Spaces below, and feel free to play around with the guidance scale parameter to see how it affects the image quality!