VQDiffusionScheduler
Overview
Original paper can be found here
VQDiffusionScheduler
class diffusers.VQDiffusionScheduler
< source >( num_vec_classes: int num_train_timesteps: int = 100 alpha_cum_start: float = 0.99999 alpha_cum_end: float = 9e-06 gamma_cum_start: float = 9e-06 gamma_cum_end: float = 0.99999 )
Parameters
-
num_vec_classes (
int
) — The number of classes of the vector embeddings of the latent pixels. Includes the class for the masked latent pixel. -
num_train_timesteps (
int
) — Number of diffusion steps used to train the model. -
alpha_cum_start (
float
) — The starting cumulative alpha value. -
alpha_cum_end (
float
) — The ending cumulative alpha value. -
gamma_cum_start (
float
) — The starting cumulative gamma value. -
gamma_cum_end (
float
) — The ending cumulative gamma value.
The VQ-diffusion transformer outputs predicted probabilities of the initial unnoised image.
The VQ-diffusion scheduler converts the transformer’s output into a sample for the unnoised image at the previous diffusion timestep.
~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__
function, such as num_train_timesteps
. They can be accessed via scheduler.config.num_train_timesteps
.
SchedulerMixin provides general loading and saving functionality via the SchedulerMixin.save_pretrained() and
from_pretrained() functions.
For more details, see the original paper: https://arxiv.org/abs/2111.14822
log_Q_t_transitioning_to_known_class
< source >(
t: torch.int32
x_t: LongTensor
log_onehot_x_t: FloatTensor
cumulative: bool
)
→
torch.FloatTensor
of shape (batch size, num classes - 1, num latent pixels)
Parameters
- t (torch.Long) — The timestep that determines which transition matrix is used.
-
x_t (
torch.LongTensor
of shape(batch size, num latent pixels)
) — The classes of each latent pixel at timet
. -
log_onehot_x_t (
torch.FloatTensor
of shape(batch size, num classes, num latent pixels)
) — The log one-hot vectors ofx_t
-
cumulative (
bool
) — If cumulative isFalse
, we use the single step transition matrixt-1
->t
. If cumulative isTrue
, we use the cumulative transition matrix0
->t
.
Returns
torch.FloatTensor
of shape (batch size, num classes - 1, num latent pixels)
Each column of the returned matrix is a row of log probabilities of the complete probability transition matrix.
When non cumulative, returns self.num_classes - 1
rows because the initial latent pixel cannot be
masked.
Where:
q_n
is the probability distribution for the forward process of then
th latent pixel.- C_0 is a class of a latent pixel embedding
- C_k is the class of the masked latent pixel
non-cumulative result (omitting logarithms):
cumulative result (omitting logarithms):
Returns the log probabilities of the rows from the (cumulative or non-cumulative) transition matrix for each
latent pixel in x_t
.
See equation (7) for the complete non-cumulative transition matrix. The complete cumulative transition matrix is the same structure except the parameters (alpha, beta, gamma) are the cumulative analogs.
q_posterior
< source >(
log_p_x_0
x_t
t
)
→
torch.FloatTensor
of shape (batch size, num classes, num latent pixels)
Calculates the log probabilities for the predicted classes of the image at timestep t-1
. I.e. Equation (11).
Instead of directly computing equation (11), we use Equation (5) to restate Equation (11) in terms of only forward probabilities.
Equation (11) stated in terms of forward probabilities via Equation (5):
Where:
- the sum is over x0 = {C_0 … C{k-1}} (classes for x_0)
p(x{t-1} | x_t) = sum( q(x_t | x{t-1}) q(x_{t-1} | x_0) p(x_0) / q(x_t | x_0) )
set_timesteps
< source >( num_inference_steps: int device: typing.Union[str, torch.device] = None )
Sets the discrete timesteps used for the diffusion chain. Supporting function to be run before inference.
step
< source >(
model_output: FloatTensor
timestep: torch.int64
sample: LongTensor
generator: typing.Optional[torch._C.Generator] = None
return_dict: bool = True
)
→
~schedulers.scheduling_utils.VQDiffusionSchedulerOutput
or tuple
Parameters
-
t (
torch.long
) — The timestep that determines which transition matrices are used.x_t — (
torch.LongTensor
of shape(batch size, num latent pixels)
): The classes of each latent pixel at timet
generator — (
torch.Generator
or None): RNG for the noise applied to p(x_{t-1} | x_t) before it is sampled from. -
return_dict (
bool
) — option for returning tuple rather than VQDiffusionSchedulerOutput class
Returns
~schedulers.scheduling_utils.VQDiffusionSchedulerOutput
or tuple
~schedulers.scheduling_utils.VQDiffusionSchedulerOutput
if return_dict
is True, otherwise a tuple
.
When returning a tuple, the first element is the sample tensor.
Predict the sample at the previous timestep via the reverse transition distribution i.e. Equation (11). See the
docstring for self.q_posterior
for more in depth docs on how Equation (11) is computed.