Diffusers documentation

Consistency Models

Diffusers

You are viewing v0.18.2 version. A newer version v0.35.1 is available.

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Consistency Models

Consistency Models were proposed in Consistency Models by Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever.

The abstract of the paper is as follows:

Diffusion models have significantly advanced the fields of image, audio, and video generation, but they depend on an iterative sampling process that causes slow generation. To overcome this limitation, we propose consistency models, a new family of models that generate high quality samples by directly mapping noise to data. They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality. They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training on these tasks. Consistency models can be trained either by distilling pre-trained diffusion models, or as standalone generative models altogether. Through extensive experiments, we demonstrate that they outperform existing distillation techniques for diffusion models in one- and few-step sampling, achieving the new state-of-the-art FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 for one-step generation. When trained in isolation, consistency models become a new family of generative models that can outperform existing one-step, non-adversarial generative models on standard benchmarks such as CIFAR-10, ImageNet 64x64 and LSUN 256x256.

Resources:

Available Checkpoints are:

cd_imagenet64_l2 (64x64 resolution) openai/consistency-model-pipelines
cd_imagenet64_lpips (64x64 resolution) openai/diffusers-cd_imagenet64_lpips
ct_imagenet64 (64x64 resolution) openai/diffusers-ct_imagenet64
cd_bedroom256_l2 (256x256 resolution) openai/diffusers-cd_bedroom256_l2
cd_bedroom256_lpips (256x256 resolution) openai/diffusers-cd_bedroom256_lpips
ct_bedroom256 (256x256 resolution) openai/diffusers-ct_bedroom256
cd_cat256_l2 (256x256 resolution) openai/diffusers-cd_cat256_l2
cd_cat256_lpips (256x256 resolution) openai/diffusers-cd_cat256_lpips
ct_cat256 (256x256 resolution) openai/diffusers-ct_cat256

Available Pipelines

Pipeline	Tasks	Demo	Colab
ConsistencyModelPipeline	Unconditional Image Generation

This pipeline was contributed by our community members dg845 and ayushtues ❤️

Usage Example

import torch

from diffusers import ConsistencyModelPipeline

device = "cuda"
# Load the cd_imagenet64_l2 checkpoint.
model_id_or_path = "openai/diffusers-cd_imagenet64_l2"
pipe = ConsistencyModelPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe.to(device)

# Onestep Sampling
image = pipe(num_inference_steps=1).images[0]
image.save("consistency_model_onestep_sample.png")

# Onestep sampling, class-conditional image generation
# ImageNet-64 class label 145 corresponds to king penguins
image = pipe(num_inference_steps=1, class_labels=145).images[0]
image.save("consistency_model_onestep_sample_penguin.png")

# Multistep sampling, class-conditional image generation
# Timesteps can be explicitly specified; the particular timesteps below are from the original Github repo.
# https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L77
image = pipe(timesteps=[22, 0], class_labels=145).images[0]
image.save("consistency_model_multistep_sample_penguin.png")

For an additional speed-up, one can also make use of torch.compile. Multiple images can be generated in <1 second as follows:

import torch
from diffusers import ConsistencyModelPipeline

device = "cuda"
# Load the cd_bedroom256_lpips checkpoint.
model_id_or_path = "openai/diffusers-cd_bedroom256_lpips"
pipe = ConsistencyModelPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe.to(device)

pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)

# Multistep sampling
# Timesteps can be explicitly specified; the particular timesteps below are from the original Github repo:
# https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L83
for _ in range(10):
    image = pipe(timesteps=[17, 0]).images[0]
    image.show()

ConsistencyModelPipeline

class diffusers.ConsistencyModelPipeline

< source >

( unet: UNet2DModel scheduler: CMStochasticIterativeScheduler )

Parameters

unet (UNet2DModel) — Unconditional or class-conditional U-Net architecture to denoise image latents.
scheduler (SchedulerMixin) — A scheduler to be used in combination with unet to denoise the image latents. Currently only compatible with CMStochasticIterativeScheduler.

Pipeline for consistency models for unconditional or class-conditional image generation, as introduced in [1].

This model inherits from DiffusionPipeline. Check the superclass documentation for the generic methods the library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)

[1] Song, Yang and Dhariwal, Prafulla and Chen, Mark and Sutskever, Ilya. “Consistency Models” https://arxiv.org/pdf/2303.01469

call

< source >

( batch_size: int = 1 class_labels: typing.Union[torch.Tensor, typing.List[int], int, NoneType] = None num_inference_steps: int = 1 timesteps: typing.List[int] = None generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.FloatTensor] = None output_type: typing.Optional[str] = 'pil' return_dict: bool = True callback: typing.Union[typing.Callable[[int, int, torch.FloatTensor], NoneType], NoneType] = None callback_steps: int = 1 ) → ImagePipelineOutput or tuple

Parameters

batch_size (int, optional, defaults to 1) — The number of images to generate.
class_labels (torch.Tensor or List[int] or int, optional) — Optional class labels for conditioning class-conditional consistency models. Will not be used if the model is not class-conditional.
num_inference_steps (int, optional, defaults to 1) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
timesteps (List[int], optional) — Custom timesteps to use for the denoising process. If not defined, equal spaced num_inference_steps timesteps are used. Must be in descending order.
generator (torch.Generator, optional) — One or a list of torch generator(s) to make generation deterministic.
latents (torch.FloatTensor, optional) — Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will ge generated by sampling using the supplied random generator.
output_type (str, optional, defaults to "pil") — The output format of the generate image. Choose between PIL: PIL.Image.Image or np.array.
return_dict (bool, optional, defaults to True) — Whether or not to return a ImagePipelineOutput instead of a plain tuple.
callback (Callable, optional) — A function that will be called every callback_steps steps during inference. The function will be called with the following arguments: callback(step: int, timestep: int, latents: torch.FloatTensor).
callback_steps (int, optional, defaults to 1) — The frequency at which the callback function will be called. If not specified, the callback will be called at every step.

Returns

ImagePipelineOutput or tuple

~pipelines.utils.ImagePipelineOutput if return_dict is True, otherwise a `tuple. When returning a tuple, the first element is a list with the generated images.

Examples:

>>> import torch

>>> from diffusers import ConsistencyModelPipeline

>>> device = "cuda"
>>> # Load the cd_imagenet64_l2 checkpoint.
>>> model_id_or_path = "openai/diffusers-cd_imagenet64_l2"
>>> pipe = ConsistencyModelPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
>>> pipe.to(device)

>>> # Onestep Sampling
>>> image = pipe(num_inference_steps=1).images[0]
>>> image.save("cd_imagenet64_l2_onestep_sample.png")

>>> # Onestep sampling, class-conditional image generation
>>> # ImageNet-64 class label 145 corresponds to king penguins
>>> image = pipe(num_inference_steps=1, class_labels=145).images[0]
>>> image.save("cd_imagenet64_l2_onestep_sample_penguin.png")

>>> # Multistep sampling, class-conditional image generation
>>> # Timesteps can be explicitly specified; the particular timesteps below are from the original Github repo:
>>> # https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L77
>>> image = pipe(num_inference_steps=None, timesteps=[22, 0], class_labels=145).images[0]
>>> image.save("cd_imagenet64_l2_multistep_sample_penguin.png")

enable_model_cpu_offload

< source >

( gpu_id = 0 )

Offloads all models to CPU using accelerate, reducing memory usage with a low impact on performance. Compared to enable_sequential_cpu_offload, this method moves one whole model at a time to the GPU when its forward method is called, and the model remains in GPU until the next model runs. Memory savings are lower than with enable_sequential_cpu_offload, but performance is much better due to the iterative execution of the unet.

enable_sequential_cpu_offload

< source >

( gpu_id = 0 )

Offloads all models to CPU using accelerate, significantly reducing memory usage. When called, unet, text_encoder, vae and safety checker have their state dicts saved to CPU and then are moved to a torch.device('meta') and loaded to GPU only when their specific submodule has its forwardmethod called. Note that offloading happens on a submodule basis. Memory savings are higher than withenable_model_cpu_offload`, but performance is lower.

←AudioLDM ControlNet→

Diffusers

Consistency Models

Available Pipelines

Usage Example

ConsistencyModelPipeline

class diffusers.ConsistencyModelPipeline

__call__

enable_model_cpu_offload

enable_sequential_cpu_offload

call