Diffusers documentation

Consistency Models

You are viewing v0.18.2 version. A newer version v0.31.0 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Consistency Models

Consistency Models were proposed in Consistency Models by Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever.

The abstract of the paper is as follows:

Diffusion models have significantly advanced the fields of image, audio, and video generation, but they depend on an iterative sampling process that causes slow generation. To overcome this limitation, we propose consistency models, a new family of models that generate high quality samples by directly mapping noise to data. They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality. They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training on these tasks. Consistency models can be trained either by distilling pre-trained diffusion models, or as standalone generative models altogether. Through extensive experiments, we demonstrate that they outperform existing distillation techniques for diffusion models in one- and few-step sampling, achieving the new state-of-the-art FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 for one-step generation. When trained in isolation, consistency models become a new family of generative models that can outperform existing one-step, non-adversarial generative models on standard benchmarks such as CIFAR-10, ImageNet 64x64 and LSUN 256x256.

Resources:

Available Checkpoints are:

Available Pipelines

Pipeline Tasks Demo Colab
ConsistencyModelPipeline Unconditional Image Generation

This pipeline was contributed by our community members dg845 and ayushtues ❤️

Usage Example

import torch

from diffusers import ConsistencyModelPipeline

device = "cuda"
# Load the cd_imagenet64_l2 checkpoint.
model_id_or_path = "openai/diffusers-cd_imagenet64_l2"
pipe = ConsistencyModelPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe.to(device)

# Onestep Sampling
image = pipe(num_inference_steps=1).images[0]
image.save("consistency_model_onestep_sample.png")

# Onestep sampling, class-conditional image generation
# ImageNet-64 class label 145 corresponds to king penguins
image = pipe(num_inference_steps=1, class_labels=145).images[0]
image.save("consistency_model_onestep_sample_penguin.png")

# Multistep sampling, class-conditional image generation
# Timesteps can be explicitly specified; the particular timesteps below are from the original Github repo.
# https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L77
image = pipe(timesteps=[22, 0], class_labels=145).images[0]
image.save("consistency_model_multistep_sample_penguin.png")

For an additional speed-up, one can also make use of torch.compile. Multiple images can be generated in <1 second as follows:

import torch
from diffusers import ConsistencyModelPipeline

device = "cuda"
# Load the cd_bedroom256_lpips checkpoint.
model_id_or_path = "openai/diffusers-cd_bedroom256_lpips"
pipe = ConsistencyModelPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe.to(device)

pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)

# Multistep sampling
# Timesteps can be explicitly specified; the particular timesteps below are from the original Github repo:
# https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L83
for _ in range(10):
    image = pipe(timesteps=[17, 0]).images[0]
    image.show()

ConsistencyModelPipeline

class diffusers.ConsistencyModelPipeline

< >

( unet: UNet2DModel scheduler: CMStochasticIterativeScheduler )

Parameters

  • unet (UNet2DModel) — Unconditional or class-conditional U-Net architecture to denoise image latents.
  • scheduler (SchedulerMixin) — A scheduler to be used in combination with unet to denoise the image latents. Currently only compatible with CMStochasticIterativeScheduler.

Pipeline for consistency models for unconditional or class-conditional image generation, as introduced in [1].

This model inherits from DiffusionPipeline. Check the superclass documentation for the generic methods the library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)

[1] Song, Yang and Dhariwal, Prafulla and Chen, Mark and Sutskever, Ilya. “Consistency Models” https://arxiv.org/pdf/2303.01469

__call__

< >

( batch_size: int = 1 class_labels: typing.Union[torch.Tensor, typing.List[int], int, NoneType] = None num_inference_steps: int = 1 timesteps: typing.List[int] = None generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.FloatTensor] = None output_type: typing.Optional[str] = 'pil' return_dict: bool = True callback: typing.Union[typing.Callable[[int, int, torch.FloatTensor], NoneType], NoneType] = None callback_steps: int = 1 ) ImagePipelineOutput or tuple

Parameters

  • batch_size (int, optional, defaults to 1) — The number of images to generate.
  • class_labels (torch.Tensor or List[int] or int, optional) — Optional class labels for conditioning class-conditional consistency models. Will not be used if the model is not class-conditional.
  • num_inference_steps (int, optional, defaults to 1) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
  • timesteps (List[int], optional) — Custom timesteps to use for the denoising process. If not defined, equal spaced num_inference_steps timesteps are used. Must be in descending order.
  • generator (torch.Generator, optional) — One or a list of torch generator(s) to make generation deterministic.
  • latents (torch.FloatTensor, optional) — Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will ge generated by sampling using the supplied random generator.
  • output_type (str, optional, defaults to "pil") — The output format of the generate image. Choose between PIL: PIL.Image.Image or np.array.
  • return_dict (bool, optional, defaults to True) — Whether or not to return a ImagePipelineOutput instead of a plain tuple.
  • callback (Callable, optional) — A function that will be called every callback_steps steps during inference. The function will be called with the following arguments: callback(step: int, timestep: int, latents: torch.FloatTensor).
  • callback_steps (int, optional, defaults to 1) — The frequency at which the callback function will be called. If not specified, the callback will be called at every step.

Returns

ImagePipelineOutput or tuple

~pipelines.utils.ImagePipelineOutput if return_dict is True, otherwise a `tuple. When returning a tuple, the first element is a list with the generated images.

Examples:

>>> import torch

>>> from diffusers import ConsistencyModelPipeline

>>> device = "cuda"
>>> # Load the cd_imagenet64_l2 checkpoint.
>>> model_id_or_path = "openai/diffusers-cd_imagenet64_l2"
>>> pipe = ConsistencyModelPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
>>> pipe.to(device)

>>> # Onestep Sampling
>>> image = pipe(num_inference_steps=1).images[0]
>>> image.save("cd_imagenet64_l2_onestep_sample.png")

>>> # Onestep sampling, class-conditional image generation
>>> # ImageNet-64 class label 145 corresponds to king penguins
>>> image = pipe(num_inference_steps=1, class_labels=145).images[0]
>>> image.save("cd_imagenet64_l2_onestep_sample_penguin.png")

>>> # Multistep sampling, class-conditional image generation
>>> # Timesteps can be explicitly specified; the particular timesteps below are from the original Github repo:
>>> # https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L77
>>> image = pipe(num_inference_steps=None, timesteps=[22, 0], class_labels=145).images[0]
>>> image.save("cd_imagenet64_l2_multistep_sample_penguin.png")

enable_model_cpu_offload

< >

( gpu_id = 0 )

Offloads all models to CPU using accelerate, reducing memory usage with a low impact on performance. Compared to enable_sequential_cpu_offload, this method moves one whole model at a time to the GPU when its forward method is called, and the model remains in GPU until the next model runs. Memory savings are lower than with enable_sequential_cpu_offload, but performance is much better due to the iterative execution of the unet.

enable_sequential_cpu_offload

< >

( gpu_id = 0 )

Offloads all models to CPU using accelerate, significantly reducing memory usage. When called, unet, text_encoder, vae and safety checker have their state dicts saved to CPU and then are moved to a torch.device('meta') and loaded to GPU only when their specific submodule has its forwardmethod called. Note that offloading happens on a submodule basis. Memory savings are higher than withenable_model_cpu_offload`, but performance is lower.