Diffusers documentation

Denoising Diffusion Probabilistic Models (DDPM)

You are viewing v0.18.2 version. A newer version v0.31.0 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Denoising Diffusion Probabilistic Models (DDPM)

Overview

Denoising Diffusion Probabilistic Models (DDPM) by Jonathan Ho, Ajay Jain and Pieter Abbeel proposes the diffusion based model of the same name, but in the context of the 🤗 Diffusers library, DDPM refers to the discrete denoising scheduler from the paper as well as the pipeline.

The abstract of the paper is the following:

We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN.

The original paper can be found here.

DDPMScheduler

class diffusers.DDPMScheduler

< >

( num_train_timesteps: int = 1000 beta_start: float = 0.0001 beta_end: float = 0.02 beta_schedule: str = 'linear' trained_betas: typing.Union[numpy.ndarray, typing.List[float], NoneType] = None variance_type: str = 'fixed_small' clip_sample: bool = True prediction_type: str = 'epsilon' thresholding: bool = False dynamic_thresholding_ratio: float = 0.995 clip_sample_range: float = 1.0 sample_max_value: float = 1.0 timestep_spacing: str = 'leading' steps_offset: int = 0 )

Parameters

  • num_train_timesteps (int) — number of diffusion steps used to train the model.
  • beta_start (float) — the starting beta value of inference.
  • beta_end (float) — the final beta value.
  • beta_schedule (str) — the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from linear, scaled_linear, squaredcos_cap_v2 or sigmoid.
  • trained_betas (np.ndarray, optional) — option to pass an array of betas directly to the constructor to bypass beta_start, beta_end etc.
  • variance_type (str) — options to clip the variance used when adding noise to the denoised sample. Choose from fixed_small, fixed_small_log, fixed_large, fixed_large_log, learned or learned_range.
  • clip_sample (bool, default True) — option to clip predicted sample for numerical stability.
  • clip_sample_range (float, default 1.0) — the maximum magnitude for sample clipping. Valid only when clip_sample=True.
  • prediction_type (str, default epsilon, optional) — prediction type of the scheduler function, one of epsilon (predicting the noise of the diffusion process), sample (directly predicting the noisy sample) or v_prediction` (see section 2.4 https://imagen.research.google/video/paper.pdf)
  • thresholding (bool, default False) — whether to use the “dynamic thresholding” method (introduced by Imagen, https://arxiv.org/abs/2205.11487). Note that the thresholding method is unsuitable for latent-space diffusion models (such as stable-diffusion).
  • dynamic_thresholding_ratio (float, default 0.995) — the ratio for the dynamic thresholding method. Default is 0.995, the same as Imagen (https://arxiv.org/abs/2205.11487). Valid only when thresholding=True.
  • sample_max_value (float, default 1.0) — the threshold value for dynamic thresholding. Valid only when thresholding=True.
  • timestep_spacing (str, default "leading") — The way the timesteps should be scaled. Refer to Table 2. of Common Diffusion Noise Schedules and Sample Steps are Flawed for more information.
  • steps_offset (int, default 0) — an offset added to the inference steps. You can use a combination of offset=1 and set_alpha_to_one=False, to make the last step use step 0 for the previous alpha product, as done in stable diffusion.

Denoising diffusion probabilistic models (DDPMs) explores the connections between denoising score matching and Langevin dynamics sampling.

~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__ function, such as num_train_timesteps. They can be accessed via scheduler.config.num_train_timesteps. SchedulerMixin provides general loading and saving functionality via the SchedulerMixin.save_pretrained() and from_pretrained() functions.

For more details, see the original paper: https://arxiv.org/abs/2006.11239

scale_model_input

< >

( sample: FloatTensor timestep: typing.Optional[int] = None ) torch.FloatTensor

Parameters

  • sample (torch.FloatTensor) — input sample
  • timestep (int, optional) — current timestep

Returns

torch.FloatTensor

scaled input sample

Ensures interchangeability with schedulers that need to scale the denoising model input depending on the current timestep.

set_timesteps

< >

( num_inference_steps: typing.Optional[int] = None device: typing.Union[str, torch.device] = None timesteps: typing.Optional[typing.List[int]] = None )

Parameters

  • num_inference_steps (Optional[int]) — the number of diffusion steps used when generating samples with a pre-trained model. If passed, then timesteps must be None.
  • device (str or torch.device, optional) — the device to which the timesteps are moved to.
  • custom_timesteps (List[int], optional) — custom timesteps used to support arbitrary spacing between timesteps. If None, then the default timestep spacing strategy of equal spacing between timesteps is used. If passed, num_inference_steps must be None.

Sets the discrete timesteps used for the diffusion chain. Supporting function to be run before inference.

step

< >

( model_output: FloatTensor timestep: int sample: FloatTensor generator = None return_dict: bool = True ) ~schedulers.scheduling_utils.DDPMSchedulerOutput or tuple

Parameters

  • model_output (torch.FloatTensor) — direct output from learned diffusion model.
  • timestep (int) — current discrete timestep in the diffusion chain.
  • sample (torch.FloatTensor) — current instance of sample being created by diffusion process. generator — random number generator.
  • return_dict (bool) — option for returning tuple rather than DDPMSchedulerOutput class

Returns

~schedulers.scheduling_utils.DDPMSchedulerOutput or tuple

~schedulers.scheduling_utils.DDPMSchedulerOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.

Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion process from the learned model outputs (most often the predicted noise).