Diffusers documentation

Variance exploding, stochastic sampling from Karras et. al

Diffusers

You are viewing v0.12.0 version. A newer version v0.35.1 is available.

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Variance exploding, stochastic sampling from Karras et. al

Overview

Original paper can be found here.

KarrasVeScheduler

class diffusers.KarrasVeScheduler

< source >

( sigma_min: float = 0.02 sigma_max: float = 100 s_noise: float = 1.007 s_churn: float = 80 s_min: float = 0.05 s_max: float = 50 )

Parameters

sigma_min (float) — minimum noise magnitude
sigma_max (float) — maximum noise magnitude
s_noise (float) — the amount of additional noise to counteract loss of detail during sampling. A reasonable range is [1.000, 1.011].
s_churn (float) — the parameter controlling the overall amount of stochasticity. A reasonable range is [0, 100].
s_min (float) — the start value of the sigma range where we add noise (enable stochasticity). A reasonable range is [0, 10].
s_max (float) — the end value of the sigma range where we add noise. A reasonable range is [0.2, 80].

Stochastic sampling from Karras et al. [1] tailored to the Variance-Expanding (VE) models [2]. Use Algorithm 2 and the VE column of Table 1 from [1] for reference.

[1] Karras, Tero, et al. “Elucidating the Design Space of Diffusion-Based Generative Models.” https://arxiv.org/abs/2206.00364 [2] Song, Yang, et al. “Score-based generative modeling through stochastic differential equations.” https://arxiv.org/abs/2011.13456

~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__ function, such as num_train_timesteps. They can be accessed via scheduler.config.num_train_timesteps. SchedulerMixin provides general loading and saving functionality via the SchedulerMixin.save_pretrained() and from_pretrained() functions.

For more details on the parameters, see the original paper’s Appendix E.: “Elucidating the Design Space of Diffusion-Based Generative Models.” https://arxiv.org/abs/2206.00364. The grid search values used to find the optimal {s_noise, s_churn, s_min, s_max} for a specific model are described in Table 5 of the paper.

add_noise_to_input

< source >

( sample: FloatTensor sigma: float generator: typing.Optional[torch._C.Generator] = None )

Explicit Langevin-like “churn” step of adding noise to the sample according to a factor gamma_i ≥ 0 to reach a higher noise level sigma_hat = sigma_i + gamma_i*sigma_i.

TODO Args:

scale_model_input

< source >

( sample: FloatTensor timestep: typing.Optional[int] = None ) → torch.FloatTensor

Parameters

sample (torch.FloatTensor) — input sample
timestep (int, optional) — current timestep

Returns

torch.FloatTensor

scaled input sample

Ensures interchangeability with schedulers that need to scale the denoising model input depending on the current timestep.

set_timesteps

< source >

( num_inference_steps: int device: typing.Union[str, torch.device] = None )

Parameters

num_inference_steps (int) — the number of diffusion steps used when generating samples with a pre-trained model.

Sets the continuous timesteps used for the diffusion chain. Supporting function to be run before inference.

step

< source >

( model_output: FloatTensor sigma_hat: float sigma_prev: float sample_hat: FloatTensor return_dict: bool = True ) → KarrasVeOutput or tuple

Parameters

model_output (torch.FloatTensor) — direct output from learned diffusion model.
sigma_hat (float) — TODO
sigma_prev (float) — TODO
sample_hat (torch.FloatTensor) — TODO
return_dict (bool) — option for returning tuple rather than KarrasVeOutput class

KarrasVeOutput — updated sample in the diffusion chain and derivative (TODO double check).

Returns

KarrasVeOutput or tuple

KarrasVeOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.

Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion process from the learned model outputs (most often the predicted noise).

step_correct

< source >

( model_output: FloatTensor sigma_hat: float sigma_prev: float sample_hat: FloatTensor sample_prev: FloatTensor derivative: FloatTensor return_dict: bool = True ) → prev_sample (TODO)

Parameters

model_output (torch.FloatTensor) — direct output from learned diffusion model.
sigma_hat (float) — TODO
sigma_prev (float) — TODO
sample_hat (torch.FloatTensor) — TODO
sample_prev (torch.FloatTensor) — TODO
derivative (torch.FloatTensor) — TODO
return_dict (bool) — option for returning tuple rather than KarrasVeOutput class

Returns

prev_sample (TODO)

updated sample in the diffusion chain. derivative (TODO): TODO

Correct the predicted sample based on the output model_output of the network. TODO complete description

←Singlestep DPM-Solver VE-SDE→