Bitsandbytes documentation

Lion

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Lion

Lion (Evolved Sign Momentum) is a unique optimizer that uses the sign of the gradient to determine the update direction of the momentum. This makes Lion more memory-efficient and faster than AdamW which tracks and store the first and second-order moments.

Lion

class bitsandbytes.optim.Lion

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 optim_bits = 32 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )

__init__

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 optim_bits = 32 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )

Parameters

  • params (torch.tensor) — The input parameters to optimize.
  • lr (float, defaults to 1e-4) — The learning rate.
  • betas (tuple(float, float), defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer.
  • weight_decay (float, defaults to 0) — The weight decay value for the optimizer.
  • optim_bits (int, defaults to 32) — The number of bits of the optimizer state.
  • args (dict, defaults to None) — A dictionary with additional arguments.
  • min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.
  • percentile_clipping (int, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
  • block_wise (bool, defaults to True) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.
  • is_paged (bool, defaults to False) — Whether the optimizer is a paged optimizer or not.

Base Lion optimizer.

Lion8bit

class bitsandbytes.optim.Lion8bit

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )

__init__

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )

Parameters

  • params (torch.tensor) — The input parameters to optimize.
  • lr (float, defaults to 1e-4) — The learning rate.
  • betas (tuple(float, float), defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer.
  • weight_decay (float, defaults to 0) — The weight decay value for the optimizer.
  • args (dict, defaults to None) — A dictionary with additional arguments.
  • min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.
  • percentile_clipping (int, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
  • block_wise (bool, defaults to True) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.
  • is_paged (bool, defaults to False) — Whether the optimizer is a paged optimizer or not.

8-bit Lion optimizer.

Lion32bit

class bitsandbytes.optim.Lion32bit

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )

__init__

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )

Parameters

  • params (torch.tensor) — The input parameters to optimize.
  • lr (float, defaults to 1e-4) — The learning rate.
  • betas (tuple(float, float), defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer.
  • weight_decay (float, defaults to 0) — The weight decay value for the optimizer.
  • args (dict, defaults to None) — A dictionary with additional arguments.
  • min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.
  • percentile_clipping (int, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
  • block_wise (bool, defaults to True) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.
  • is_paged (bool, defaults to False) — Whether the optimizer is a paged optimizer or not.

32-bit Lion optimizer.

PagedLion

class bitsandbytes.optim.PagedLion

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 optim_bits = 32 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )

__init__

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 optim_bits = 32 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )

Parameters

  • params (torch.tensor) — The input parameters to optimize.
  • lr (float, defaults to 1e-4) — The learning rate.
  • betas (tuple(float, float), defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer.
  • weight_decay (float, defaults to 0) — The weight decay value for the optimizer.
  • optim_bits (int, defaults to 32) — The number of bits of the optimizer state.
  • args (dict, defaults to None) — A dictionary with additional arguments.
  • min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.
  • percentile_clipping (int, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
  • block_wise (bool, defaults to True) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.

Paged Lion optimizer.

PagedLion8bit

class bitsandbytes.optim.PagedLion8bit

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )

__init__

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )

Parameters

  • params (torch.tensor) — The input parameters to optimize.
  • lr (float, defaults to 1e-4) — The learning rate.
  • betas (tuple(float, float), defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer.
  • weight_decay (float, defaults to 0) — The weight decay value for the optimizer.
  • optim_bits (int, defaults to 32) — The number of bits of the optimizer state.
  • args (dict, defaults to None) — A dictionary with additional arguments.
  • min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.
  • percentile_clipping (int, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
  • block_wise (bool, defaults to True) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.

Paged 8-bit Lion optimizer.

PagedLion32bit

class bitsandbytes.optim.PagedLion32bit

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )

__init__

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )

Parameters

  • params (torch.tensor) — The input parameters to optimize.
  • lr (float, defaults to 1e-4) — The learning rate.
  • betas (tuple(float, float), defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer.
  • weight_decay (float, defaults to 0) — The weight decay value for the optimizer.
  • optim_bits (int, defaults to 32) — The number of bits of the optimizer state.
  • args (dict, defaults to None) — A dictionary with additional arguments.
  • min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.
  • percentile_clipping (int, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
  • block_wise (bool, defaults to True) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.

Paged 32-bit Lion optimizer.