Diffusers documentation

Consistency Model Multistep Scheduler

Diffusers

You are viewing v0.18.2 version. A newer version v0.27.2 is available.

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Consistency Model Multistep Scheduler

Overview

Multistep and onestep scheduler (Algorithm 1) introduced alongside consistency models in the paper Consistency Models by Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Based on the original consistency models implementation. Should generate good samples from ConsistencyModelPipeline in one or a small number of steps.

CMStochasticIterativeScheduler

class diffusers.CMStochasticIterativeScheduler

< source >

( num_train_timesteps: int = 40 sigma_min: float = 0.002 sigma_max: float = 80.0 sigma_data: float = 0.5 s_noise: float = 1.0 rho: float = 7.0 clip_denoised: bool = True )

Parameters

num_train_timesteps (int) — number of diffusion steps used to train the model.
sigma_min (float) — Minimum noise magnitude in the sigma schedule. This was set to 0.002 in the original implementation.
sigma_max (float) — Maximum noise magnitude in the sigma schedule. This was set to 80.0 in the original implementation.
sigma_data (float) — The standard deviation of the data distribution, following the EDM paper [2]. This was set to 0.5 in the original implementation, which is also the original value suggested in the EDM paper.
s_noise (float) — The amount of additional noise to counteract loss of detail during sampling. A reasonable range is [1.000, 1.011]. This was set to 1.0 in the original implementation.
rho (float) — The rho parameter used for calculating the Karras sigma schedule, introduced in the EDM paper [2]. This was set to 7.0 in the original implementation, which is also the original value suggested in the EDM paper.
clip_denoised (bool) — Whether to clip the denoised outputs to (-1, 1). Defaults to True.
timesteps (List or np.ndarray or torch.Tensor, optional) — Optionally, an explicit timestep schedule can be specified. The timesteps are expected to be in increasing order.

Multistep and onestep sampling for consistency models from Song et al. 2023 [1]. This implements Algorithm 1 in the paper [1].

[1] Song, Yang and Dhariwal, Prafulla and Chen, Mark and Sutskever, Ilya. “Consistency Models” https://arxiv.org/pdf/2303.01469 [2] Karras, Tero, et al. “Elucidating the Design Space of Diffusion-Based Generative Models.” https://arxiv.org/abs/2206.00364

~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__ function, such as num_train_timesteps. They can be accessed via scheduler.config.num_train_timesteps. SchedulerMixin provides general loading and saving functionality via the SchedulerMixin.save_pretrained() and from_pretrained() functions.

get_scalings_for_boundary_condition

< source >

( sigma ) → tuple

Parameters

sigma (torch.FloatTensor) — The current sigma in the Karras sigma schedule.

Returns

tuple

A two-element tuple where c_skip (which weights the current sample) is the first element and c_out (which weights the consistency model output) is the second element.

Gets the scalings used in the consistency model parameterization, following Appendix C of the original paper. This enforces the consistency model boundary condition.

Note that epsilon in the equations for c_skip and c_out is set to sigma_min.

scale_model_input

< source >

( sample: FloatTensor timestep: typing.Union[float, torch.FloatTensor] ) → torch.FloatTensor

Parameters

sample (torch.FloatTensor) — input sample
timestep (float or torch.FloatTensor) — the current timestep in the diffusion chain

Returns

torch.FloatTensor

scaled input sample

Scales the consistency model input by (sigma**2 + sigma_data**2) ** 0.5, following the EDM model.

set_timesteps

< source >

( num_inference_steps: typing.Optional[int] = None device: typing.Union[str, torch.device] = None timesteps: typing.Optional[typing.List[int]] = None )

Parameters

num_inference_steps (int) — the number of diffusion steps used when generating samples with a pre-trained model.
device (str or torch.device, optional) — the device to which the timesteps should be moved to. If None, the timesteps are not moved.
timesteps (List[int], optional) — custom timesteps used to support arbitrary spacing between timesteps. If None, then the default timestep spacing strategy of equal spacing between timesteps is used. If passed, num_inference_steps must be None.

Sets the timesteps used for the diffusion chain. Supporting function to be run before inference.

sigma_to_t

< source >

( sigmas: typing.Union[float, numpy.ndarray] ) → float or np.ndarray

Parameters

sigmas (float or np.ndarray) — single Karras sigma or array of Karras sigmas

Returns

float or np.ndarray

scaled input timestep or scaled input timestep array

Gets scaled timesteps from the Karras sigmas, for input to the consistency model.

step

< source >

( model_output: FloatTensor timestep: typing.Union[float, torch.FloatTensor] sample: FloatTensor generator: typing.Optional[torch._C.Generator] = None return_dict: bool = True ) → ~schedulers.scheduling_utils.CMStochasticIterativeSchedulerOutput or tuple

Parameters

model_output (torch.FloatTensor) — direct output from learned diffusion model.
timestep (float) — current timestep in the diffusion chain.
sample (torch.FloatTensor) — current instance of sample being created by diffusion process.
generator (torch.Generator, optional) — Random number generator.
return_dict (bool) — option for returning tuple rather than EulerDiscreteSchedulerOutput class

Returns

~schedulers.scheduling_utils.CMStochasticIterativeSchedulerOutput or tuple

~schedulers.scheduling_utils.CMStochasticIterativeSchedulerOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.

Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion process from the learned model outputs (most often the predicted noise).

←Overview DDIM→