.optimization module provides:
an optimizer with weight decay fixed that can be used to fine-tuned models, and
several schedules in the form of schedule objects that inherit from
AdamW(params, lr=0.001, betas=(0.9, 0.999), eps=1e-06, weight_decay=0.0, correct_bias=True)¶
Implements Adam algorithm with weight decay fix.
lr (float) – learning rate. Default 1e-3.
betas (tuple of 2 floats) – Adams beta parameters (b1, b2). Default: (0.9, 0.999)
eps (float) – Adams epsilon. Default: 1e-6
weight_decay (float) – Weight decay. Default: 0.0
correct_bias (bool) – can be set to False to avoid correcting bias in Adam (e.g. like in Bert TF repository). Default True.
Performs a single optimization step.
closure (callable, optional) – A closure that reevaluates the model and returns the loss.
Create a schedule with a constant learning rate.
get_constant_schedule_with_warmup(optimizer, num_warmup_steps, last_epoch=-1)¶
Create a schedule with a constant learning rate preceded by a warmup period during which the learning rate increases linearly between 0 and 1.
get_cosine_with_hard_restarts_schedule_with_warmup(optimizer, num_warmup_steps, num_training_steps, num_cycles=1.0, last_epoch=-1)¶
Create a schedule with a learning rate that decreases following the values of the cosine function with several hard restarts, after a warmup period during which it increases linearly between 0 and 1.
get_linear_schedule_with_warmup(optimizer, num_warmup_steps, num_training_steps, last_epoch=-1)¶
Create a schedule with a learning rate that decreases linearly after linearly increasing during a warmup period.