ⓘ You are viewing legacy docs. Go to latest documentation instead.

Optimization¶

The .optimization module provides:

an optimizer with weight decay fixed that can be used to fine-tuned models, and
several schedules in the form of schedule objects that inherit from _LRSchedule:
a gradient accumulation class to accumulate the gradients of multiple batches

AdamW (PyTorch)¶

AdaFactor (PyTorch)¶

AdamWeightDecay (TensorFlow)¶

Schedules¶

Learning Rate Schedules (Pytorch)¶

Warmup (TensorFlow)¶

Gradient Strategies¶

GradientAccumulator (TensorFlow)¶