Diffusers documentation

Pipelines

You are viewing v0.18.0 version. A newer version v0.28.0 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Pipelines

Pipelines provide a simple way to run state-of-the-art diffusion models in inference. Most diffusion systems consist of multiple independently-trained models and highly adaptable scheduler components - all of which are needed to have a functioning end-to-end diffusion system.

As an example, Stable Diffusion has three independently trained models:

To that end, we strive to offer all open-sourced, state-of-the-art diffusion system under a unified API. More specifically, we strive to provide pipelines that

Note that pipelines do not (and should not) offer any training functionality. If you are looking for official training examples, please have a look at examples.

🧨 Diffusers Summary

The following table summarizes all officially supported pipelines, their corresponding paper, and if available a colab notebook to directly try them out.

Pipeline Paper Tasks Colab
alt_diffusion AltDiffusion Image-to-Image Text-Guided Generation -
audio_diffusion Audio Diffusion Unconditional Audio Generation
controlnet ControlNet with Stable Diffusion Image-to-Image Text-Guided Generation Open In Colab
cycle_diffusion Cycle Diffusion Image-to-Image Text-Guided Generation
dance_diffusion Dance Diffusion Unconditional Audio Generation
ddpm Denoising Diffusion Probabilistic Models Unconditional Image Generation
ddim Denoising Diffusion Implicit Models Unconditional Image Generation
if IF Image Generation Open In Colab
if_img2img IF Image-to-Image Generation Open In Colab
if_inpainting IF Image-to-Image Generation Open In Colab
kandinsky Kandinsky Text-to-Image Generation
kandinsky_inpaint Kandinsky Image-to-Image Generation
kandinsky_img2img Kandinsksy Image-to-Image Generation
latent_diffusion High-Resolution Image Synthesis with Latent Diffusion Models Text-to-Image Generation
latent_diffusion High-Resolution Image Synthesis with Latent Diffusion Models Super Resolution Image-to-Image
latent_diffusion_uncond High-Resolution Image Synthesis with Latent Diffusion Models Unconditional Image Generation
paint_by_example Paint by Example: Exemplar-based Image Editing with Diffusion Models Image-Guided Image Inpainting
paradigms Parallel Sampling of Diffusion Models Text-to-Image Generation
pndm Pseudo Numerical Methods for Diffusion Models on Manifolds Unconditional Image Generation
score_sde_ve Score-Based Generative Modeling through Stochastic Differential Equations Unconditional Image Generation
score_sde_vp Score-Based Generative Modeling through Stochastic Differential Equations Unconditional Image Generation
semantic_stable_diffusion SEGA: Instructing Diffusion using Semantic Dimensions Text-to-Image Generation
stable_diffusion_text2img Stable Diffusion Text-to-Image Generation Open In Colab
stable_diffusion_img2img Stable Diffusion Image-to-Image Text-Guided Generation Open In Colab
stable_diffusion_inpaint Stable Diffusion Text-Guided Image Inpainting Open In Colab
stable_diffusion_panorama MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation Text-Guided Panorama View Generation
stable_diffusion_pix2pix InstructPix2Pix: Learning to Follow Image Editing Instructions Text-Based Image Editing
stable_diffusion_pix2pix_zero Zero-shot Image-to-Image Translation Text-Based Image Editing
stable_diffusion_attend_and_excite Attend and Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models Text-to-Image Generation
stable_diffusion_self_attention_guidance Self-Attention Guidance Text-to-Image Generation
stable_diffusion_image_variation Stable Diffusion Image Variations Image-to-Image Generation
stable_diffusion_latent_upscale Stable Diffusion Latent Upscaler Text-Guided Super Resolution Image-to-Image
stable_diffusion_2 Stable Diffusion 2 Text-Guided Image Inpainting
stable_diffusion_2 Stable Diffusion 2 Depth-to-Image Text-Guided Generation
stable_diffusion_2 Stable Diffusion 2 Text-Guided Super Resolution Image-to-Image
stable_diffusion_safe Safe Stable Diffusion Text-Guided Generation Open In Colab
stable_unclip Stable unCLIP Text-to-Image Generation
stable_unclip Stable unCLIP Image-to-Image Text-Guided Generation
stochastic_karras_ve Elucidating the Design Space of Diffusion-Based Generative Models Unconditional Image Generation
text_to_video_sd Modelscope’s Text-to-video-synthesis Model in Open Domain Text-to-Video Generation
unclip **Hierarchical Text-Conditional Image Generation with CLIP Latents Text-to-Image Generation
versatile_diffusion Versatile Diffusion: Text, Images and Variations All in One Diffusion Model Text-to-Image Generation
versatile_diffusion Versatile Diffusion: Text, Images and Variations All in One Diffusion Model Image Variations Generation
versatile_diffusion Versatile Diffusion: Text, Images and Variations All in One Diffusion Model Dual Image and Text Guided Generation
vq_diffusion Vector Quantized Diffusion Model for Text-to-Image Synthesis Text-to-Image Generation
text_to_video_zero Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators Text-to-Video Generation

Note: Pipelines are simple examples of how to play around with the diffusion systems as described in the corresponding papers.

However, most of them can be adapted to use different scheduler components or even different model components. Some pipeline examples are shown in the Examples below.

Pipelines API

Diffusion models often consist of multiple independently-trained models or other previously existing components.

Each model has been trained independently on a different task and the scheduler can easily be swapped out and replaced with a different one. During inference, we however want to be able to easily load all components and use them in inference - even if one component, e.g. CLIP’s text encoder, originates from a different library, such as Transformers. To that end, all pipelines provide the following functionality:

  • from_pretrained method that accepts a Hugging Face Hub repository id, e.g. runwayml/stable-diffusion-v1-5 or a path to a local directory, e.g. ”./stable-diffusion”. To correctly retrieve which models and components should be loaded, one has to provide a model_index.json file, e.g. runwayml/stable-diffusion-v1-5/model_index.json, which defines all components that should be loaded into the pipelines. More specifically, for each model/component one needs to define the format <name>: ["<library>", "<class name>"]. <name> is the attribute name given to the loaded instance of <class name> which can be found in the library or pipeline folder called "<library>".
  • save_pretrained that accepts a local path, e.g. ./stable-diffusion under which all models/components of the pipeline will be saved. For each component/model a folder is created inside the local path that is named after the given attribute name, e.g. ./stable_diffusion/unet. In addition, a model_index.json file is created at the root of the local path, e.g. ./stable_diffusion/model_index.json so that the complete pipeline can again be instantiated from the local path.
  • to which accepts a string or torch.device to move all models that are of type torch.nn.Module to the passed device. The behavior is fully analogous to PyTorch’s to method.
  • __call__ method to use the pipeline in inference. __call__ defines inference logic of the pipeline and should ideally encompass all aspects of it, from pre-processing to forwarding tensors to the different models and schedulers, as well as post-processing. The API of the __call__ method can strongly vary from pipeline to pipeline. E.g. a text-to-image pipeline, such as StableDiffusionPipeline should accept among other things the text prompt to generate the image. A pure image generation pipeline, such as DDPMPipeline on the other hand can be run without providing any inputs. To better understand what inputs can be adapted for each pipeline, one should look directly into the respective pipeline.

Note: All pipelines have PyTorch’s autograd disabled by decorating the __call__ method with a torch.no_grad decorator because pipelines should not be used for training. If you want to store the gradients during the forward pass, we recommend writing your own pipeline, see also our community-examples.