Diffusers documentation

Denoising diffusion implicit models (DDIM)

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.14.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Denoising diffusion implicit models (DDIM)


Denoising Diffusion Implicit Models (DDIM) by Jiaming Song, Chenlin Meng and Stefano Ermon.

The abstract of the paper is the following:

Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from. We empirically demonstrate that DDIMs can produce high quality samples 10× to 50× faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.

The original codebase of this paper can be found here: ermongroup/ddim. For questions, feel free to contact the author on tsong.me.


class diffusers.DDIMScheduler

< >

( num_train_timesteps: int = 1000 beta_start: float = 0.0001 beta_end: float = 0.02 beta_schedule: str = 'linear' trained_betas: typing.Union[numpy.ndarray, typing.List[float], NoneType] = None clip_sample: bool = True set_alpha_to_one: bool = True steps_offset: int = 0 prediction_type: str = 'epsilon' thresholding: bool = False dynamic_thresholding_ratio: float = 0.995 clip_sample_range: float = 1.0 sample_max_value: float = 1.0 )


  • num_train_timesteps (int) — number of diffusion steps used to train the model.
  • beta_start (float) — the starting beta value of inference.
  • beta_end (float) — the final beta value.
  • beta_schedule (str) — the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from linear, scaled_linear, or squaredcos_cap_v2.
  • trained_betas (np.ndarray, optional) — option to pass an array of betas directly to the constructor to bypass beta_start, beta_end etc.
  • clip_sample (bool, default True) — option to clip predicted sample for numerical stability.
  • clip_sample_range (float, default 1.0) — the maximum magnitude for sample clipping. Valid only when clip_sample=True.
  • set_alpha_to_one (bool, default True) — each diffusion step uses the value of alphas product at that step and at the previous one. For the final step there is no previous alpha. When this option is True the previous alpha product is fixed to 1, otherwise it uses the value of alpha at step 0.
  • steps_offset (int, default 0) — an offset added to the inference steps. You can use a combination of offset=1 and set_alpha_to_one=False, to make the last step use step 0 for the previous alpha product, as done in stable diffusion.
  • prediction_type (str, default epsilon, optional) — prediction type of the scheduler function, one of epsilon (predicting the noise of the diffusion process), sample (directly predicting the noisy sample) or v_prediction` (see section 2.4 https://imagen.research.google/video/paper.pdf)
  • thresholding (bool, default False) — whether to use the “dynamic thresholding” method (introduced by Imagen, https://arxiv.org/abs/2205.11487). Note that the thresholding method is unsuitable for latent-space diffusion models (such as stable-diffusion).
  • dynamic_thresholding_ratio (float, default 0.995) — the ratio for the dynamic thresholding method. Default is 0.995, the same as Imagen (https://arxiv.org/abs/2205.11487). Valid only when thresholding=True.
  • sample_max_value (float, default 1.0) — the threshold value for dynamic thresholding. Valid only when thresholding=True.

Denoising diffusion implicit models is a scheduler that extends the denoising procedure introduced in denoising diffusion probabilistic models (DDPMs) with non-Markovian guidance.

~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__ function, such as num_train_timesteps. They can be accessed via scheduler.config.num_train_timesteps. SchedulerMixin provides general loading and saving functionality via the SchedulerMixin.save_pretrained() and from_pretrained() functions.

For more details, see the original paper: https://arxiv.org/abs/2010.02502


< >

( sample: FloatTensor timestep: typing.Optional[int] = None ) torch.FloatTensor


  • sample (torch.FloatTensor) — input sample
  • timestep (int, optional) — current timestep



scaled input sample

Ensures interchangeability with schedulers that need to scale the denoising model input depending on the current timestep.


< >

( num_inference_steps: int device: typing.Union[str, torch.device] = None )


  • num_inference_steps (int) — the number of diffusion steps used when generating samples with a pre-trained model.

Sets the discrete timesteps used for the diffusion chain. Supporting function to be run before inference.


< >

( model_output: FloatTensor timestep: int sample: FloatTensor eta: float = 0.0 use_clipped_model_output: bool = False generator = None variance_noise: typing.Optional[torch.FloatTensor] = None return_dict: bool = True ) ~schedulers.scheduling_utils.DDIMSchedulerOutput or tuple


  • model_output (torch.FloatTensor) — direct output from learned diffusion model.
  • timestep (int) — current discrete timestep in the diffusion chain.
  • sample (torch.FloatTensor) — current instance of sample being created by diffusion process.
  • eta (float) — weight of noise for added noise in diffusion step.
  • use_clipped_model_output (bool) — if True, compute “corrected” model_output from the clipped predicted original sample. Necessary because predicted original sample is clipped to [-1, 1] when self.config.clip_sample is True. If no clipping has happened, “corrected” model_output would coincide with the one provided as input and use_clipped_model_output will have not effect. generator — random number generator.
  • variance_noise (torch.FloatTensor) — instead of generating noise for the variance using generator, we can directly provide the noise for the variance itself. This is useful for methods such as CycleDiffusion. (https://arxiv.org/abs/2210.05559)
  • return_dict (bool) — option for returning tuple rather than DDIMSchedulerOutput class


~schedulers.scheduling_utils.DDIMSchedulerOutput or tuple

~schedulers.scheduling_utils.DDIMSchedulerOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.

Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion process from the learned model outputs (most often the predicted noise).