UniPCMultistepScheduler
UniPCMultistepScheduler
is a training-free framework designed for fast sampling of diffusion models. It was introduced in UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models by Wenliang Zhao, Lujia Bai, Yongming Rao, Jie Zhou, Jiwen Lu.
It consists of a corrector (UniC) and a predictor (UniP) that share a unified analytical form and support arbitrary orders. UniPC is by design model-agnostic, supporting pixel-space/latent-space DPMs on unconditional/conditional sampling. It can also be applied to both noise prediction and data prediction models. The corrector UniC can be also applied after any off-the-shelf solvers to increase the order of accuracy.
The abstract from the paper is:
Diffusion probabilistic models (DPMs) have demonstrated a very promising ability in high-resolution image synthesis. However, sampling from a pre-trained DPM is time-consuming due to the multiple evaluations of the denoising network, making it more and more important to accelerate the sampling of DPMs. Despite recent progress in designing fast samplers, existing methods still cannot generate satisfying images in many applications where fewer steps (e.g., <10) are favored. In this paper, we develop a unified corrector (UniC) that can be applied after any existing DPM sampler to increase the order of accuracy without extra model evaluations, and derive a unified predictor (UniP) that supports arbitrary order as a byproduct. Combining UniP and UniC, we propose a unified predictor-corrector framework called UniPC for the fast sampling of DPMs, which has a unified analytical form for any order and can significantly improve the sampling quality over previous methods, especially in extremely few steps. We evaluate our methods through extensive experiments including both unconditional and conditional sampling using pixel-space and latent-space DPMs. Our UniPC can achieve 3.87 FID on CIFAR10 (unconditional) and 7.51 FID on ImageNet 256×256 (conditional) with only 10 function evaluations. Code is available at this https URL.
Tips
It is recommended to set solver_order
to 2 for guide sampling, and solver_order=3
for unconditional sampling.
Dynamic thresholding from Imagen is supported, and for pixel-space
diffusion models, you can set both predict_x0=True
and thresholding=True
to use dynamic thresholding. This thresholding method is unsuitable for latent-space diffusion models such as Stable Diffusion.
UniPCMultistepScheduler
class diffusers.UniPCMultistepScheduler
< source >( num_train_timesteps: int = 1000 beta_start: float = 0.0001 beta_end: float = 0.02 beta_schedule: str = 'linear' trained_betas: Union = None solver_order: int = 2 prediction_type: str = 'epsilon' thresholding: bool = False dynamic_thresholding_ratio: float = 0.995 sample_max_value: float = 1.0 predict_x0: bool = True solver_type: str = 'bh2' lower_order_final: bool = True disable_corrector: List = [] solver_p: SchedulerMixin = None use_karras_sigmas: Optional = False timestep_spacing: str = 'linspace' steps_offset: int = 0 final_sigmas_type: Optional = 'zero' rescale_betas_zero_snr: bool = False )
Parameters
- num_train_timesteps (
int
, defaults to 1000) — The number of diffusion steps to train the model. - beta_start (
float
, defaults to 0.0001) — The startingbeta
value of inference. - beta_end (
float
, defaults to 0.02) — The finalbeta
value. - beta_schedule (
str
, defaults to"linear"
) — The beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose fromlinear
,scaled_linear
, orsquaredcos_cap_v2
. - trained_betas (
np.ndarray
, optional) — Pass an array of betas directly to the constructor to bypassbeta_start
andbeta_end
. - solver_order (
int
, default2
) — The UniPC order which can be any positive integer. The effective order of accuracy issolver_order + 1
due to the UniC. It is recommended to usesolver_order=2
for guided sampling, andsolver_order=3
for unconditional sampling. - prediction_type (
str
, defaults toepsilon
, optional) — Prediction type of the scheduler function; can beepsilon
(predicts the noise of the diffusion process),sample
(directly predicts the noisy sample) or
v_prediction` (see section 2.4 of Imagen Video paper). - thresholding (
bool
, defaults toFalse
) — Whether to use the “dynamic thresholding” method. This is unsuitable for latent-space diffusion models such as Stable Diffusion. - dynamic_thresholding_ratio (
float
, defaults to 0.995) — The ratio for the dynamic thresholding method. Valid only whenthresholding=True
. - sample_max_value (
float
, defaults to 1.0) — The threshold value for dynamic thresholding. Valid only whenthresholding=True
andpredict_x0=True
. - predict_x0 (
bool
, defaults toTrue
) — Whether to use the updating algorithm on the predicted x0. - solver_type (
str
, defaultbh2
) — Solver type for UniPC. It is recommended to usebh1
for unconditional sampling when steps < 10, andbh2
otherwise. - lower_order_final (
bool
, defaultTrue
) — Whether to use lower-order solvers in the final steps. Only valid for < 15 inference steps. This can stabilize the sampling of DPMSolver for steps < 15, especially for steps <= 10. - disable_corrector (
list
, default[]
) — Decides which step to disable the corrector to mitigate the misalignment betweenepsilon_theta(x_t, c)
andepsilon_theta(x_t^c, c)
which can influence convergence for a large guidance scale. Corrector is usually disabled during the first few steps. - solver_p (
SchedulerMixin
, defaultNone
) — Any other scheduler that if specified, the algorithm becomessolver_p + UniC
. - use_karras_sigmas (
bool
, optional, defaults toFalse
) — Whether to use Karras sigmas for step sizes in the noise schedule during the sampling process. IfTrue
, the sigmas are determined according to a sequence of noise levels {σi}. - timestep_spacing (
str
, defaults to"linspace"
) — The way the timesteps should be scaled. Refer to Table 2 of the Common Diffusion Noise Schedules and Sample Steps are Flawed for more information. - steps_offset (
int
, defaults to 0) — An offset added to the inference steps, as required by some model families. - final_sigmas_type (
str
, defaults to"zero"
) — The finalsigma
value for the noise schedule during the sampling process. If"sigma_min"
, the final sigma is the same as the last sigma in the training schedule. Ifzero
, the final sigma is set to 0. - rescale_betas_zero_snr (
bool
, defaults toFalse
) — Whether to rescale the betas to have zero terminal SNR. This enables the model to generate very bright and dark samples instead of limiting it to samples with medium brightness. Loosely related to--offset_noise
.
UniPCMultistepScheduler
is a training-free framework designed for the fast sampling of diffusion models.
This model inherits from SchedulerMixin and ConfigMixin. Check the superclass documentation for the generic methods the library implements for all schedulers such as loading and saving.
convert_model_output
< source >( model_output: Tensor *args sample: Tensor = None **kwargs ) → torch.Tensor
Parameters
- model_output (
torch.Tensor
) — The direct output from the learned diffusion model. - timestep (
int
) — The current discrete timestep in the diffusion chain. - sample (
torch.Tensor
) — A current instance of a sample created by the diffusion process.
Returns
torch.Tensor
The converted model output.
Convert the model output to the corresponding type the UniPC algorithm needs.
multistep_uni_c_bh_update
< source >( this_model_output: Tensor *args last_sample: Tensor = None this_sample: Tensor = None order: int = None **kwargs ) → torch.Tensor
Parameters
- this_model_output (
torch.Tensor
) — The model outputs atx_t
. - this_timestep (
int
) — The current timestept
. - last_sample (
torch.Tensor
) — The generated sample before the last predictorx_{t-1}
. - this_sample (
torch.Tensor
) — The generated sample after the last predictorx_{t}
. - order (
int
) — Thep
of UniC-p at this step. The effective order of accuracy should beorder + 1
.
Returns
torch.Tensor
The corrected sample tensor at the current timestep.
One step for the UniC (B(h) version).
multistep_uni_p_bh_update
< source >( model_output: Tensor *args sample: Tensor = None order: int = None **kwargs ) → torch.Tensor
Parameters
- model_output (
torch.Tensor
) — The direct output from the learned diffusion model at the current timestep. - prev_timestep (
int
) — The previous discrete timestep in the diffusion chain. - sample (
torch.Tensor
) — A current instance of a sample created by the diffusion process. - order (
int
) — The order of UniP at this timestep (corresponds to the p in UniPC-p).
Returns
torch.Tensor
The sample tensor at the previous timestep.
One step for the UniP (B(h) version). Alternatively, self.solver_p
is used if is specified.
scale_model_input
< source >( sample: Tensor *args **kwargs ) → torch.Tensor
Ensures interchangeability with schedulers that need to scale the denoising model input depending on the current timestep.
set_begin_index
< source >( begin_index: int = 0 )
Sets the begin index for the scheduler. This function should be run from pipeline before the inference.
set_timesteps
< source >( num_inference_steps: int device: Union = None )
Sets the discrete timesteps used for the diffusion chain (to be run before inference).
step
< source >( model_output: Tensor timestep: Union sample: Tensor return_dict: bool = True ) → SchedulerOutput or tuple
Parameters
- model_output (
torch.Tensor
) — The direct output from learned diffusion model. - timestep (
int
) — The current discrete timestep in the diffusion chain. - sample (
torch.Tensor
) — A current instance of a sample created by the diffusion process. - return_dict (
bool
) — Whether or not to return a SchedulerOutput ortuple
.
Returns
SchedulerOutput or tuple
If return_dict is True
, SchedulerOutput is returned, otherwise a
tuple is returned where the first element is the sample tensor.
Predict the sample from the previous timestep by reversing the SDE. This function propagates the sample with the multistep UniPC.
SchedulerOutput
class diffusers.schedulers.scheduling_utils.SchedulerOutput
< source >( prev_sample: Tensor )
Base class for the output of a scheduler’s step
function.