Diffusers documentation

Variance exploding, stochastic sampling from Karras et. al

You are viewing v0.16.0 version. A newer version v0.27.2 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Variance exploding, stochastic sampling from Karras et. al

Overview

Original paper can be found here.

KarrasVeScheduler

class diffusers.KarrasVeScheduler

< >

( sigma_min: float = 0.02 sigma_max: float = 100 s_noise: float = 1.007 s_churn: float = 80 s_min: float = 0.05 s_max: float = 50 )

Parameters

  • sigma_min (float) — minimum noise magnitude
  • sigma_max (float) — maximum noise magnitude
  • s_noise (float) — the amount of additional noise to counteract loss of detail during sampling. A reasonable range is [1.000, 1.011].
  • s_churn (float) — the parameter controlling the overall amount of stochasticity. A reasonable range is [0, 100].
  • s_min (float) — the start value of the sigma range where we add noise (enable stochasticity). A reasonable range is [0, 10].
  • s_max (float) — the end value of the sigma range where we add noise. A reasonable range is [0.2, 80].

Stochastic sampling from Karras et al. [1] tailored to the Variance-Expanding (VE) models [2]. Use Algorithm 2 and the VE column of Table 1 from [1] for reference.

[1] Karras, Tero, et al. “Elucidating the Design Space of Diffusion-Based Generative Models.” https://arxiv.org/abs/2206.00364 [2] Song, Yang, et al. “Score-based generative modeling through stochastic differential equations.” https://arxiv.org/abs/2011.13456

~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__ function, such as num_train_timesteps. They can be accessed via scheduler.config.num_train_timesteps. SchedulerMixin provides general loading and saving functionality via the SchedulerMixin.save_pretrained() and from_pretrained() functions.

For more details on the parameters, see the original paper’s Appendix E.: “Elucidating the Design Space of Diffusion-Based Generative Models.” https://arxiv.org/abs/2206.00364. The grid search values used to find the optimal {s_noise, s_churn, s_min, s_max} for a specific model are described in Table 5 of the paper.

add_noise_to_input

< >

( sample: FloatTensor sigma: float generator: typing.Optional[torch._C.Generator] = None )

Explicit Langevin-like “churn” step of adding noise to the sample according to a factor gamma_i ≥ 0 to reach a higher noise level sigma_hat = sigma_i + gamma_i*sigma_i.

TODO Args:

scale_model_input

< >

( sample: FloatTensor timestep: typing.Optional[int] = None ) torch.FloatTensor

Parameters

  • sample (torch.FloatTensor) — input sample
  • timestep (int, optional) — current timestep

Returns

torch.FloatTensor

scaled input sample

Ensures interchangeability with schedulers that need to scale the denoising model input depending on the current timestep.

set_timesteps

< >

( num_inference_steps: int device: typing.Union[str, torch.device] = None )

Parameters

  • num_inference_steps (int) — the number of diffusion steps used when generating samples with a pre-trained model.

Sets the continuous timesteps used for the diffusion chain. Supporting function to be run before inference.

step

< >

( model_output: FloatTensor sigma_hat: float sigma_prev: float sample_hat: FloatTensor return_dict: bool = True ) KarrasVeOutput or tuple

Parameters

  • model_output (torch.FloatTensor) — direct output from learned diffusion model.
  • sigma_hat (float) — TODO
  • sigma_prev (float) — TODO
  • sample_hat (torch.FloatTensor) — TODO
  • return_dict (bool) — option for returning tuple rather than KarrasVeOutput class

    KarrasVeOutput — updated sample in the diffusion chain and derivative (TODO double check).

Returns

KarrasVeOutput or tuple

KarrasVeOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.

Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion process from the learned model outputs (most often the predicted noise).

step_correct

< >

( model_output: FloatTensor sigma_hat: float sigma_prev: float sample_hat: FloatTensor sample_prev: FloatTensor derivative: FloatTensor return_dict: bool = True ) prev_sample (TODO)

Parameters

  • model_output (torch.FloatTensor) — direct output from learned diffusion model.
  • sigma_hat (float) — TODO
  • sigma_prev (float) — TODO
  • sample_hat (torch.FloatTensor) — TODO
  • sample_prev (torch.FloatTensor) — TODO
  • derivative (torch.FloatTensor) — TODO
  • return_dict (bool) — option for returning tuple rather than KarrasVeOutput class

Returns

prev_sample (TODO)

updated sample in the diffusion chain. derivative (TODO): TODO

Correct the predicted sample based on the output model_output of the network. TODO complete description