Bitsandbytes documentation

LARS

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

LARS

LARS (Layer-wise Adaptive Rate Scaling) is an optimizer designed for training with large batch sizes to accelerate training. LARS uses a separate learning rate for each layer instead of each parameter. The learning rate is calculated from a trust ratio between the weight and gradient norm in a layer. This helps calibrate a stable update size.

LARS

class bitsandbytes.optim.LARS

< >

( params lr momentum = 0 dampening = 0 weight_decay = 0 nesterov = False optim_bits = 32 args = None min_8bit_size = 4096 percentile_clipping = 100 max_unorm = 0.02 )

__init__

< >

( params lr momentum = 0 dampening = 0 weight_decay = 0 nesterov = False optim_bits = 32 args = None min_8bit_size = 4096 percentile_clipping = 100 max_unorm = 0.02 )

Parameters

  • params (torch.tensor) — The input parameters to optimize.
  • lr (float) — The learning rate.
  • momentum (float, defaults to 0) — The momentum value speeds up the optimizer by taking bigger steps.
  • dampening (float, defaults to 0) — The dampening value reduces the momentum of the optimizer.
  • weight_decay (float, defaults to 1e-2) — The weight decay value for the optimizer.
  • nesterov (bool, defaults to False) — Whether to use Nesterov momentum.
  • optim_bits (int, defaults to 32) — The number of bits of the optimizer state.
  • args (dict, defaults to None) — A dictionary with additional arguments.
  • min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.
  • percentile_clipping (int, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
  • max_unorm (float, defaults to 0.02) — The maximum gradient norm.

Base LARS optimizer.

LARS8bit

class bitsandbytes.optim.LARS8bit

< >

( params lr momentum = 0 dampening = 0 weight_decay = 0 nesterov = False args = None min_8bit_size = 4096 percentile_clipping = 100 max_unorm = 0.02 )

__init__

< >

( params lr momentum = 0 dampening = 0 weight_decay = 0 nesterov = False args = None min_8bit_size = 4096 percentile_clipping = 100 max_unorm = 0.02 )

Parameters

  • params (torch.tensor) — The input parameters to optimize.
  • lr (float) — The learning rate.
  • momentum (float, defaults to 0) — The momentum value speeds up the optimizer by taking bigger steps.
  • dampening (float, defaults to 0) — The dampening value reduces the momentum of the optimizer.
  • weight_decay (float, defaults to 1e-2) — The weight decay value for the optimizer.
  • nesterov (bool, defaults to False) — Whether to use Nesterov momentum.
  • args (dict, defaults to None) — A dictionary with additional arguments.
  • min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.
  • percentile_clipping (int, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
  • max_unorm (float, defaults to 0.02) — The maximum gradient norm.

8-bit LARS optimizer.

LARS32bit

class bitsandbytes.optim.LARS32bit

< >

( params lr momentum = 0 dampening = 0 weight_decay = 0 nesterov = False args = None min_8bit_size = 4096 percentile_clipping = 100 max_unorm = 0.02 )

__init__

< >

( params lr momentum = 0 dampening = 0 weight_decay = 0 nesterov = False args = None min_8bit_size = 4096 percentile_clipping = 100 max_unorm = 0.02 )

Parameters

  • params (torch.tensor) — The input parameters to optimize.
  • lr (float) — The learning rate.
  • momentum (float, defaults to 0) — The momentum value speeds up the optimizer by taking bigger steps.
  • dampening (float, defaults to 0) — The dampening value reduces the momentum of the optimizer.
  • weight_decay (float, defaults to 1e-2) — The weight decay value for the optimizer.
  • nesterov (bool, defaults to False) — Whether to use Nesterov momentum.
  • args (dict, defaults to None) — A dictionary with additional arguments.
  • min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.
  • percentile_clipping (int, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
  • max_unorm (float, defaults to 0.02) — The maximum gradient norm.

32-bit LARS optimizer.