Optimizer

The .optimization module provides:

  • an optimizer with weight decay fixed that can be used to fine-tuned models, and

  • several schedules in the form of schedule objects that inherit from _LRSchedule:

AdamW

class transformers.AdamW(params, lr=0.001, betas=(0.9, 0.999), eps=1e-06, weight_decay=0.0, correct_bias=True)[source]

Implements Adam algorithm with weight decay fix.

Parameters
  • lr (float) – learning rate. Default 1e-3.

  • betas (tuple of 2 floats) – Adams beta parameters (b1, b2). Default: (0.9, 0.999)

  • eps (float) – Adams epsilon. Default: 1e-6

  • weight_decay (float) – Weight decay. Default: 0.0

  • correct_bias (bool) – can be set to False to avoid correcting bias in Adam (e.g. like in Bert TF repository). Default True.

step(closure=None)[source]

Performs a single optimization step.

Parameters

closure (callable, optional) – A closure that reevaluates the model and returns the loss.

Schedules

transformers.get_constant_schedule(optimizer, last_epoch=-1)[source]

Create a schedule with a constant learning rate.

transformers.get_constant_schedule_with_warmup(optimizer, num_warmup_steps, last_epoch=-1)[source]

Create a schedule with a constant learning rate preceded by a warmup period during which the learning rate increases linearly between 0 and 1.

transformers.get_cosine_with_hard_restarts_schedule_with_warmup(optimizer, num_warmup_steps, num_training_steps, num_cycles=1.0, last_epoch=-1)[source]

Create a schedule with a learning rate that decreases following the values of the cosine function with several hard restarts, after a warmup period during which it increases linearly between 0 and 1.

transformers.get_linear_schedule_with_warmup(optimizer, num_warmup_steps, num_training_steps, last_epoch=-1)[source]

Create a schedule with a learning rate that decreases linearly after linearly increasing during a warmup period.