Schedulers
Diffusers contains multiple pre-built schedule functions for the diffusion process.
What is a scheduler?
The schedule functions, denoted Schedulers in the library take in the output of a trained model, a sample which the diffusion process is iterating on, and a timestep to return a denoised sample.
- Schedulers define the methodology for iteratively adding noise to an image or for updating a sample based on model outputs.
- adding noise in different manners represent the algorithmic processes to train a diffusion model by adding noise to images.
- for inference, the scheduler defines how to update a sample based on an output from a pretrained model.
- Schedulers are often defined by a noise schedule and an update rule to solve the differential equation solution.
Discrete versus continuous schedulers
All schedulers take in a timestep to predict the updated version of the sample being diffused.
The timesteps dictate where in the diffusion process the step is, where data is generated by iterating forward in time and inference is executed by propagating backwards through timesteps.
Different algorithms use timesteps that both discrete (accepting int
inputs), such as the DDPMScheduler or PNDMScheduler, and continuous (accepting float
inputs), such as the score-based schedulers ScoreSdeVeScheduler or ScoreSdeVpScheduler
.
Designing Re-usable schedulers
The core design principle between the schedule functions is to be model, system, and framework independent. This allows for rapid experimentation and cleaner abstractions in the code, where the model prediction is separated from the sample update. To this end, the design of schedulers is such that:
- Schedulers can be used interchangeably between diffusion models in inference to find the preferred trade-off between speed and generation quality.
- Schedulers are currently by default in PyTorch, but are designed to be framework independent (partial Numpy support currently exists).
API
The core API for any new scheduler must follow a limited structure.
- Schedulers should provide one or more
def step(...)
functions that should be called to update the generated sample iteratively. - Schedulers should provide a
set_timesteps(...)
method that configures the parameters of a schedule function for a specific inference task. - Schedulers should be framework-agonstic, but provide a simple functionality to convert the scheduler into a specific framework, such as PyTorch
with a
set_format(...)
method.
The base class SchedulerMixin implements low level utilities used by multiple schedulers.
SchedulerMixin
Mixin containing common functions for the schedulers.
match_shape
< source >( values: typing.Union[numpy.ndarray, torch.Tensor] broadcast_array: typing.Union[numpy.ndarray, torch.Tensor] )
Turns a 1-D array into an array or tensor with len(broadcast_array.shape) dims.
SchedulerOutput
The class `SchedulerOutput` contains the ouputs from any schedulers `step(...)` call.class diffusers.schedulers.scheduling_utils.SchedulerOutput
< source >( prev_sample: FloatTensor )
Base class for the scheduler’s step function output.
Implemented Schedulers
Denoising diffusion implicit models (DDIM)
Original paper can be found here.
class diffusers.DDIMScheduler
< source >( num_train_timesteps: int = 1000 beta_start: float = 0.0001 beta_end: float = 0.02 beta_schedule: str = 'linear' trained_betas: typing.Optional[numpy.ndarray] = None timestep_values: typing.Optional[numpy.ndarray] = None clip_sample: bool = True set_alpha_to_one: bool = True tensor_format: str = 'pt' )
Parameters
-
num_train_timesteps (
int
) — number of diffusion steps used to train the model. -
beta_start (
float
) — the startingbeta
value of inference. -
beta_end (
float
) — the finalbeta
value. -
beta_schedule (
str
) — the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose fromlinear
,scaled_linear
, orsquaredcos_cap_v2
. -
trained_betas (
np.ndarray
, optional) — TODO -
timestep_values (
np.ndarray
, optional) — TODO -
clip_sample (
bool
, defaultTrue
) — option to clip predicted sample between -1 and 1 for numerical stability. -
set_alpha_to_one (
bool
, defaultTrue
) — if alpha for final step is 1 or the final alpha of the “non-previous” one. -
tensor_format (
str
) — whether the scheduler expects pytorch or numpy arrays.
Denoising diffusion implicit models is a scheduler that extends the denoising procedure introduced in denoising diffusion probabilistic models (DDPMs) with non-Markovian guidance.
~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__
function, such as num_train_timesteps
. They can be accessed via scheduler.config.num_train_timesteps
.
~ConfigMixin also provides general loading and saving functionality via the save_config() and
from_config() functios.
For more details, see the original paper: https://arxiv.org/abs/2010.02502
set_timesteps
< source >( num_inference_steps: int offset: int = 0 )
Sets the discrete timesteps used for the diffusion chain. Supporting function to be run before inference.
step
< source >(
model_output: typing.Union[torch.FloatTensor, numpy.ndarray]
timestep: int
sample: typing.Union[torch.FloatTensor, numpy.ndarray]
eta: float = 0.0
use_clipped_model_output: bool = False
generator = None
return_dict: bool = True
)
→
SchedulerOutput or tuple
Parameters
-
model_output (
torch.FloatTensor
ornp.ndarray
) — direct output from learned diffusion model. -
timestep (
int
) — current discrete timestep in the diffusion chain. -
sample (
torch.FloatTensor
ornp.ndarray
) — current instance of sample being created by diffusion process. -
eta (
float
) — weight of noise for added noise in diffusion step. -
use_clipped_model_output (
bool
) — TODO generator — random number generator. -
return_dict (
bool
) — option for returning tuple rather than SchedulerOutput class
Returns
SchedulerOutput or tuple
SchedulerOutput if return_dict
is True, otherwise a tuple
. When
returning a tuple, the first element is the sample tensor.
Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion process from the learned model outputs (most often the predicted noise).
Denoising diffusion probabilistic models (DDPM)
Original paper can be found here.
class diffusers.DDPMScheduler
< source >( num_train_timesteps: int = 1000 beta_start: float = 0.0001 beta_end: float = 0.02 beta_schedule: str = 'linear' trained_betas: typing.Optional[numpy.ndarray] = None variance_type: str = 'fixed_small' clip_sample: bool = True tensor_format: str = 'pt' )
Parameters
-
num_train_timesteps (
int
) — number of diffusion steps used to train the model. -
beta_start (
float
) — the startingbeta
value of inference. -
beta_end (
float
) — the finalbeta
value. -
beta_schedule (
str
) — the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose fromlinear
,scaled_linear
, orsquaredcos_cap_v2
. -
trained_betas (
np.ndarray
, optional) — TODO -
variance_type (
str
) — options to clip the variance used when adding noise to the denoised sample. Choose fromfixed_small
,fixed_small_log
,fixed_large
,fixed_large_log
,learned
orlearned_range
. -
clip_sample (
bool
, defaultTrue
) — option to clip predicted sample between -1 and 1 for numerical stability. -
tensor_format (
str
) — whether the scheduler expects pytorch or numpy arrays.
Denoising diffusion probabilistic models (DDPMs) explores the connections between denoising score matching and Langevin dynamics sampling.
~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__
function, such as num_train_timesteps
. They can be accessed via scheduler.config.num_train_timesteps
.
~ConfigMixin also provides general loading and saving functionality via the save_config() and
from_config() functios.
For more details, see the original paper: https://arxiv.org/abs/2006.11239
set_timesteps
< source >( num_inference_steps: int )
Sets the discrete timesteps used for the diffusion chain. Supporting function to be run before inference.
step
< source >(
model_output: typing.Union[torch.FloatTensor, numpy.ndarray]
timestep: int
sample: typing.Union[torch.FloatTensor, numpy.ndarray]
predict_epsilon = True
generator = None
return_dict: bool = True
)
→
SchedulerOutput or tuple
Parameters
-
model_output (
torch.FloatTensor
ornp.ndarray
) — direct output from learned diffusion model. -
timestep (
int
) — current discrete timestep in the diffusion chain. -
sample (
torch.FloatTensor
ornp.ndarray
) — current instance of sample being created by diffusion process. -
eta (
float
) — weight of noise for added noise in diffusion step. -
predict_epsilon (
bool
) — optional flag to use when model predicts the samples directly instead of the noise, epsilon. generator — random number generator. -
return_dict (
bool
) — option for returning tuple rather than SchedulerOutput class
Returns
SchedulerOutput or tuple
SchedulerOutput if return_dict
is True, otherwise a tuple
. When
returning a tuple, the first element is the sample tensor.
Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion process from the learned model outputs (most often the predicted noise).
Varience exploding, stochastic sampling from Karras et. al
Original paper can be found here.
class diffusers.KarrasVeScheduler
< source >( sigma_min: float = 0.02 sigma_max: float = 100 s_noise: float = 1.007 s_churn: float = 80 s_min: float = 0.05 s_max: float = 50 tensor_format: str = 'pt' )
Parameters
-
sigma_min (
float
) — minimum noise magnitude -
sigma_max (
float
) — maximum noise magnitude -
s_noise (
float
) — the amount of additional noise to counteract loss of detail during sampling. A reasonable range is [1.000, 1.011]. -
s_churn (
float
) — the parameter controlling the overall amount of stochasticity. A reasonable range is [0, 100]. -
s_min (
float
) — the start value of the sigma range where we add noise (enable stochasticity). A reasonable range is [0, 10]. -
s_max (
float
) — the end value of the sigma range where we add noise. A reasonable range is [0.2, 80]. -
tensor_format (
str
) — whether the scheduler expects pytorch or numpy arrays.
Stochastic sampling from Karras et al. [1] tailored to the Variance-Expanding (VE) models [2]. Use Algorithm 2 and the VE column of Table 1 from [1] for reference.
[1] Karras, Tero, et al. “Elucidating the Design Space of Diffusion-Based Generative Models.” https://arxiv.org/abs/2206.00364 [2] Song, Yang, et al. “Score-based generative modeling through stochastic differential equations.” https://arxiv.org/abs/2011.13456
~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__
function, such as num_train_timesteps
. They can be accessed via scheduler.config.num_train_timesteps
.
~ConfigMixin also provides general loading and saving functionality via the save_config() and
from_config() functios.
For more details on the parameters, see the original paper’s Appendix E.: “Elucidating the Design Space of Diffusion-Based Generative Models.” https://arxiv.org/abs/2206.00364. The grid search values used to find the optimal {s_noise, s_churn, s_min, s_max} for a specific model are described in Table 5 of the paper.
add_noise_to_input
< source >( sample: typing.Union[torch.FloatTensor, numpy.ndarray] sigma: float generator: typing.Optional[torch._C.Generator] = None )
Explicit Langevin-like “churn” step of adding noise to the sample according to a factor gamma_i ≥ 0 to reach a higher noise level sigma_hat = sigma_i + gamma_i*sigma_i.
TODO Args:
set_timesteps
< source >( num_inference_steps: int )
Sets the continuous timesteps used for the diffusion chain. Supporting function to be run before inference.
step
< source >(
model_output: typing.Union[torch.FloatTensor, numpy.ndarray]
sigma_hat: float
sigma_prev: float
sample_hat: typing.Union[torch.FloatTensor, numpy.ndarray]
return_dict: bool = True
)
→
KarrasVeOutput
or tuple
Parameters
-
model_output (
torch.FloatTensor
ornp.ndarray
) — direct output from learned diffusion model. -
sigma_hat (
float
) — TODO -
sigma_prev (
float
) — TODO -
sample_hat (
torch.FloatTensor
ornp.ndarray
) — TODO -
return_dict (
bool
) — option for returning tuple rather than SchedulerOutput classKarrasVeOutput — updated sample in the diffusion chain and derivative (TODO double check).
Returns
KarrasVeOutput
or tuple
KarrasVeOutput
if return_dict
is True, otherwise a tuple
. When
returning a tuple, the first element is the sample tensor.
Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion process from the learned model outputs (most often the predicted noise).
step_correct
< source >( model_output: typing.Union[torch.FloatTensor, numpy.ndarray] sigma_hat: float sigma_prev: float sample_hat: typing.Union[torch.FloatTensor, numpy.ndarray] sample_prev: typing.Union[torch.FloatTensor, numpy.ndarray] derivative: typing.Union[torch.FloatTensor, numpy.ndarray] return_dict: bool = True ) → prev_sample (TODO)
Parameters
-
model_output (
torch.FloatTensor
ornp.ndarray
) — direct output from learned diffusion model. -
sigma_hat (
float
) — TODO -
sigma_prev (
float
) — TODO -
sample_hat (
torch.FloatTensor
ornp.ndarray
) — TODO -
sample_prev (
torch.FloatTensor
ornp.ndarray
) — TODO -
derivative (
torch.FloatTensor
ornp.ndarray
) — TODO -
return_dict (
bool
) — option for returning tuple rather than SchedulerOutput class
Returns
prev_sample (TODO)
updated sample in the diffusion chain. derivative (TODO): TODO
Correct the predicted sample based on the output model_output of the network. TODO complete description
Linear multistep scheduler for discrete beta schedules
Original implementation can be found here.
class diffusers.LMSDiscreteScheduler
< source >( num_train_timesteps: int = 1000 beta_start: float = 0.0001 beta_end: float = 0.02 beta_schedule: str = 'linear' trained_betas: typing.Optional[numpy.ndarray] = None timestep_values: typing.Optional[numpy.ndarray] = None tensor_format: str = 'pt' )
Parameters
-
num_train_timesteps (
int
) — number of diffusion steps used to train the model. -
beta_start (
float
) — the startingbeta
value of inference. -
beta_end (
float
) — the finalbeta
value. -
beta_schedule (
str
) — the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose fromlinear
orscaled_linear
. -
trained_betas (
np.ndarray
, optional) — TODO options to clip the variance used when adding noise to the denoised sample. Choose fromfixed_small
,fixed_small_log
,fixed_large
,fixed_large_log
,learned
orlearned_range
. -
timestep_values (
np.ndarry
, optional) — TODO -
tensor_format (
str
) — whether the scheduler expects pytorch or numpy arrays.
Linear Multistep Scheduler for discrete beta schedules. Based on the original k-diffusion implementation by Katherine Crowson: https://github.com/crowsonkb/k-diffusion/blob/481677d114f6ea445aa009cf5bd7a9cdee909e47/k_diffusion/sampling.py#L181
~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__
function, such as num_train_timesteps
. They can be accessed via scheduler.config.num_train_timesteps
.
~ConfigMixin also provides general loading and saving functionality via the save_config() and
from_config() functios.
get_lms_coefficient
< source >( order t current_order )
Compute a linear multistep coefficient.
set_timesteps
< source >( num_inference_steps: int )
Sets the timesteps used for the diffusion chain. Supporting function to be run before inference.
step
< source >(
model_output: typing.Union[torch.FloatTensor, numpy.ndarray]
timestep: int
sample: typing.Union[torch.FloatTensor, numpy.ndarray]
order: int = 4
return_dict: bool = True
)
→
SchedulerOutput or tuple
Parameters
-
model_output (
torch.FloatTensor
ornp.ndarray
) — direct output from learned diffusion model. -
timestep (
int
) — current discrete timestep in the diffusion chain. -
sample (
torch.FloatTensor
ornp.ndarray
) — current instance of sample being created by diffusion process. order — coefficient for multi-step inference. -
return_dict (
bool
) — option for returning tuple rather than SchedulerOutput class
Returns
SchedulerOutput or tuple
SchedulerOutput if return_dict
is True, otherwise a tuple
. When
returning a tuple, the first element is the sample tensor.
Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion process from the learned model outputs (most often the predicted noise).
Pseudo numerical methods for diffusion models (PNDM)
Original implementation can be found here.
class diffusers.PNDMScheduler
< source >( num_train_timesteps: int = 1000 beta_start: float = 0.0001 beta_end: float = 0.02 beta_schedule: str = 'linear' trained_betas: typing.Optional[numpy.ndarray] = None tensor_format: str = 'pt' skip_prk_steps: bool = False )
Parameters
-
num_train_timesteps (
int
) — number of diffusion steps used to train the model. -
beta_start (
float
) — the startingbeta
value of inference. -
beta_end (
float
) — the finalbeta
value. -
beta_schedule (
str
) — the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose fromlinear
,scaled_linear
, orsquaredcos_cap_v2
. -
trained_betas (
np.ndarray
, optional) — TODO -
tensor_format (
str
) — whether the scheduler expects pytorch or numpy arrays -
skip_prk_steps (
bool
) — allows the scheduler to skip the Runge-Kutta steps that are defined in the original paper as being required before plms steps; defaults toFalse
.
Pseudo numerical methods for diffusion models (PNDM) proposes using more advanced ODE integration techniques, namely Runge-Kutta method and a linear multi-step method.
~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__
function, such as num_train_timesteps
. They can be accessed via scheduler.config.num_train_timesteps
.
~ConfigMixin also provides general loading and saving functionality via the save_config() and
from_config() functios.
For more details, see the original paper: https://arxiv.org/abs/2202.09778
set_timesteps
< source >( num_inference_steps: int offset: int = 0 )
Sets the discrete timesteps used for the diffusion chain. Supporting function to be run before inference.
step
< source >(
model_output: typing.Union[torch.FloatTensor, numpy.ndarray]
timestep: int
sample: typing.Union[torch.FloatTensor, numpy.ndarray]
return_dict: bool = True
)
→
SchedulerOutput or tuple
Parameters
-
model_output (
torch.FloatTensor
ornp.ndarray
) — direct output from learned diffusion model. -
timestep (
int
) — current discrete timestep in the diffusion chain. -
sample (
torch.FloatTensor
ornp.ndarray
) — current instance of sample being created by diffusion process. -
return_dict (
bool
) — option for returning tuple rather than SchedulerOutput class
Returns
SchedulerOutput or tuple
SchedulerOutput if return_dict
is True, otherwise a tuple
. When
returning a tuple, the first element is the sample tensor.
Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion process from the learned model outputs (most often the predicted noise).
This function calls step_prk()
or step_plms()
depending on the internal variable counter
.
step_plms
< source >(
model_output: typing.Union[torch.FloatTensor, numpy.ndarray]
timestep: int
sample: typing.Union[torch.FloatTensor, numpy.ndarray]
return_dict: bool = True
)
→
SchedulerOutput
or tuple
Parameters
-
model_output (
torch.FloatTensor
ornp.ndarray
) — direct output from learned diffusion model. -
timestep (
int
) — current discrete timestep in the diffusion chain. -
sample (
torch.FloatTensor
ornp.ndarray
) — current instance of sample being created by diffusion process. -
return_dict (
bool
) — option for returning tuple rather than SchedulerOutput class
Returns
SchedulerOutput
or tuple
SchedulerOutput
if return_dict
is
True, otherwise a tuple
. When returning a tuple, the first element is the sample tensor.
Step function propagating the sample with the linear multi-step method. This has one forward pass with multiple times to approximate the solution.
step_prk
< source >(
model_output: typing.Union[torch.FloatTensor, numpy.ndarray]
timestep: int
sample: typing.Union[torch.FloatTensor, numpy.ndarray]
return_dict: bool = True
)
→
SchedulerOutput
or tuple
Parameters
-
model_output (
torch.FloatTensor
ornp.ndarray
) — direct output from learned diffusion model. -
timestep (
int
) — current discrete timestep in the diffusion chain. -
sample (
torch.FloatTensor
ornp.ndarray
) — current instance of sample being created by diffusion process. -
return_dict (
bool
) — option for returning tuple rather than SchedulerOutput class
Returns
SchedulerOutput
or tuple
SchedulerOutput
if return_dict
is
True, otherwise a tuple
. When returning a tuple, the first element is the sample tensor.
Step function propagating the sample with the Runge-Kutta method. RK takes 4 forward passes to approximate the solution to the differential equation.
variance exploding stochastic differential equation (SDE) scheduler
Original paper can be found here.
class diffusers.ScoreSdeVeScheduler
< source >( num_train_timesteps: int = 2000 snr: float = 0.15 sigma_min: float = 0.01 sigma_max: float = 1348.0 sampling_eps: float = 1e-05 correct_steps: int = 1 tensor_format: str = 'pt' )
Parameters
-
snr (
float
) — coefficient weighting the step from the model_output sample (from the network) to the random noise. -
sigma_min (
float
) — initial noise scale for sigma sequence in sampling procedure. The minimum sigma should mirror the distribution of the data. -
sigma_max (
float
) — maximum value used for the range of continuous timesteps passed into the model. -
sampling_eps (
float
) — the end value of sampling, where timesteps decrease progessively from 1 to epsilon. — -
correct_steps (
int
) — number of correction steps performed on a produced sample. -
tensor_format (
str
) — “np” or “pt” for the expected format of samples passed to the Scheduler.
The variance exploding stochastic differential equation (SDE) scheduler.
For more information, see the original paper: https://arxiv.org/abs/2011.13456
~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__
function, such as num_train_timesteps
. They can be accessed via scheduler.config.num_train_timesteps
.
~ConfigMixin also provides general loading and saving functionality via the save_config() and
from_config() functios.
set_sigmas
< source >( num_inference_steps: int sigma_min: float = None sigma_max: float = None sampling_eps: float = None )
Parameters
-
num_inference_steps (
int
) — the number of diffusion steps used when generating samples with a pre-trained model. -
sigma_min (
float
, optional) — initial noise scale value (overrides value given at Scheduler instantiation). -
sigma_max (
float
, optional) — final noise scale value (overrides value given at Scheduler instantiation). -
sampling_eps (
float
, optional) — final timestep value (overrides value given at Scheduler instantiation).
Sets the noise scales used for the diffusion chain. Supporting function to be run before inference.
The sigmas control the weight of the drift
and diffusion
components of sample update.
set_timesteps
< source >( num_inference_steps: int sampling_eps: float = None )
Sets the continuous timesteps used for the diffusion chain. Supporting function to be run before inference.
step_correct
< source >(
model_output: typing.Union[torch.FloatTensor, numpy.ndarray]
sample: typing.Union[torch.FloatTensor, numpy.ndarray]
generator: typing.Optional[torch._C.Generator] = None
return_dict: bool = True
**kwargs
)
→
SdeVeOutput
or tuple
Parameters
-
model_output (
torch.FloatTensor
ornp.ndarray
) — direct output from learned diffusion model. -
sample (
torch.FloatTensor
ornp.ndarray
) — current instance of sample being created by diffusion process. generator — random number generator. -
return_dict (
bool
) — option for returning tuple rather than SchedulerOutput class
Returns
SdeVeOutput
or tuple
SdeVeOutput
if
return_dict
is True, otherwise a tuple
. When returning a tuple, the first element is the sample tensor.
Correct the predicted sample based on the output model_output of the network. This is often run repeatedly after making the prediction for the previous timestep.
step_pred
< source >(
model_output: typing.Union[torch.FloatTensor, numpy.ndarray]
timestep: int
sample: typing.Union[torch.FloatTensor, numpy.ndarray]
generator: typing.Optional[torch._C.Generator] = None
return_dict: bool = True
**kwargs
)
→
SdeVeOutput
or tuple
Parameters
-
model_output (
torch.FloatTensor
ornp.ndarray
) — direct output from learned diffusion model. -
timestep (
int
) — current discrete timestep in the diffusion chain. -
sample (
torch.FloatTensor
ornp.ndarray
) — current instance of sample being created by diffusion process. generator — random number generator. -
return_dict (
bool
) — option for returning tuple rather than SchedulerOutput class
Returns
SdeVeOutput
or tuple
SdeVeOutput
if
return_dict
is True, otherwise a tuple
. When returning a tuple, the first element is the sample tensor.
Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion process from the learned model outputs (most often the predicted noise).
variance preserving stochastic differential equation (SDE) scheduler
Original paper can be found here.
Score SDE-VP is under construction.
class diffusers.schedulers.ScoreSdeVpScheduler
< source >( num_train_timesteps = 2000 beta_min = 0.1 beta_max = 20 sampling_eps = 0.001 tensor_format = 'np' )
The variance preserving stochastic differential equation (SDE) scheduler.
~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__
function, such as num_train_timesteps
. They can be accessed via scheduler.config.num_train_timesteps
.
~ConfigMixin also provides general loading and saving functionality via the save_config() and
from_config() functios.
For more information, see the original paper: https://arxiv.org/abs/2011.13456
UNDER CONSTRUCTION