Diffusers documentation

VQDiffusionScheduler

You are viewing v0.12.0 version. A newer version v0.31.0 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

VQDiffusionScheduler

Overview

Original paper can be found here

VQDiffusionScheduler

class diffusers.VQDiffusionScheduler

< >

( num_vec_classes: int num_train_timesteps: int = 100 alpha_cum_start: float = 0.99999 alpha_cum_end: float = 9e-06 gamma_cum_start: float = 9e-06 gamma_cum_end: float = 0.99999 )

Parameters

  • num_vec_classes (int) — The number of classes of the vector embeddings of the latent pixels. Includes the class for the masked latent pixel.
  • num_train_timesteps (int) — Number of diffusion steps used to train the model.
  • alpha_cum_start (float) — The starting cumulative alpha value.
  • alpha_cum_end (float) — The ending cumulative alpha value.
  • gamma_cum_start (float) — The starting cumulative gamma value.
  • gamma_cum_end (float) — The ending cumulative gamma value.

The VQ-diffusion transformer outputs predicted probabilities of the initial unnoised image.

The VQ-diffusion scheduler converts the transformer’s output into a sample for the unnoised image at the previous diffusion timestep.

~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__ function, such as num_train_timesteps. They can be accessed via scheduler.config.num_train_timesteps. SchedulerMixin provides general loading and saving functionality via the SchedulerMixin.save_pretrained() and from_pretrained() functions.

For more details, see the original paper: https://arxiv.org/abs/2111.14822

log_Q_t_transitioning_to_known_class

< >

( t: torch.int32 x_t: LongTensor log_onehot_x_t: FloatTensor cumulative: bool ) torch.FloatTensor of shape (batch size, num classes - 1, num latent pixels)

Parameters

  • t (torch.Long) — The timestep that determines which transition matrix is used.
  • x_t (torch.LongTensor of shape (batch size, num latent pixels)) — The classes of each latent pixel at time t.
  • log_onehot_x_t (torch.FloatTensor of shape (batch size, num classes, num latent pixels)) — The log one-hot vectors of x_t
  • cumulative (bool) — If cumulative is False, we use the single step transition matrix t-1->t. If cumulative is True, we use the cumulative transition matrix 0->t.

Returns

torch.FloatTensor of shape (batch size, num classes - 1, num latent pixels)

Each column of the returned matrix is a row of log probabilities of the complete probability transition matrix.

When non cumulative, returns self.num_classes - 1 rows because the initial latent pixel cannot be masked.

Where:

  • q_n is the probability distribution for the forward process of the nth latent pixel.
  • C_0 is a class of a latent pixel embedding
  • C_k is the class of the masked latent pixel

non-cumulative result (omitting logarithms):

_0(x_t | x_{t-1\} = C_0) ... q_n(x_t | x_{t-1\} = C_0) . . . . . . . . . q_0(x_t | x_{t-1\} = C_k) ... q_n(x_t | x_{t-1\} = C_k)`} />

cumulative result (omitting logarithms):

_0_cumulative(x_t | x_0 = C_0) ... q_n_cumulative(x_t | x_0 = C_0) . . . . . . . . . q_0_cumulative(x_t | x_0 = C_{k-1\}) ... q_n_cumulative(x_t | x_0 = C_{k-1\})`} />

Returns the log probabilities of the rows from the (cumulative or non-cumulative) transition matrix for each latent pixel in x_t.

See equation (7) for the complete non-cumulative transition matrix. The complete cumulative transition matrix is the same structure except the parameters (alpha, beta, gamma) are the cumulative analogs.

q_posterior

< >

( log_p_x_0 x_t t ) torch.FloatTensor of shape (batch size, num classes, num latent pixels)

Parameters

  • t (torch.Long) — The timestep that determines which transition matrix is used.

Returns

torch.FloatTensor of shape (batch size, num classes, num latent pixels)

The log probabilities for the predicted classes of the image at timestep t-1. I.e. Equation (11).

Calculates the log probabilities for the predicted classes of the image at timestep t-1. I.e. Equation (11).

Instead of directly computing equation (11), we use Equation (5) to restate Equation (11) in terms of only forward probabilities.

Equation (11) stated in terms of forward probabilities via Equation (5):

Where:

  • the sum is over x0 = {C_0 … C{k-1}} (classes for x_0)

p(x{t-1} | x_t) = sum( q(x_t | x{t-1}) q(x_{t-1} | x_0) p(x_0) / q(x_t | x_0) )

set_timesteps

< >

( num_inference_steps: int device: typing.Union[str, torch.device] = None )

Parameters

  • num_inference_steps (int) — the number of diffusion steps used when generating samples with a pre-trained model.
  • device (str or torch.device) — device to place the timesteps and the diffusion process parameters (alpha, beta, gamma) on.

Sets the discrete timesteps used for the diffusion chain. Supporting function to be run before inference.

step

< >

( model_output: FloatTensor timestep: torch.int64 sample: LongTensor generator: typing.Optional[torch._C.Generator] = None return_dict: bool = True ) ~schedulers.scheduling_utils.VQDiffusionSchedulerOutput or tuple

Parameters

  • t (torch.long) — The timestep that determines which transition matrices are used.

    x_t — (torch.LongTensor of shape (batch size, num latent pixels)): The classes of each latent pixel at time t

    generator — (torch.Generator or None): RNG for the noise applied to p(x_{t-1} | x_t) before it is sampled from.

  • return_dict (bool) — option for returning tuple rather than VQDiffusionSchedulerOutput class

Returns

~schedulers.scheduling_utils.VQDiffusionSchedulerOutput or tuple

~schedulers.scheduling_utils.VQDiffusionSchedulerOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.

Predict the sample at the previous timestep via the reverse transition distribution i.e. Equation (11). See the docstring for self.q_posterior for more in depth docs on how Equation (11) is computed.