Bitsandbytes documentation

Lion

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.43.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Lion

Lion (Evolved Sign Momentum) is a unique optimizer that uses the sign of the gradient to determine the update direction of the momentum. This makes Lion more memory-efficient and faster than AdamW which tracks and store the first and second-order moments.

Lion

class bitsandbytes.optim.Lion

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 optim_bits = 32 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )

__init__

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 optim_bits = 32 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )

Parameters

  • params (torch.tensor) — The input parameters to optimize.
  • lr (float, defaults to 1e-4) — The learning rate.
  • betas (tuple(float, float), defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer.
  • weight_decay (float, defaults to 0) — The weight decay value for the optimizer.
  • optim_bits (int, defaults to 32) — The number of bits of the optimizer state.
  • args (object, defaults to None) — An object with additional arguments.
  • min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.
  • percentile_clipping (int, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
  • block_wise (bool, defaults to True) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.
  • is_paged (bool, defaults to False) — Whether the optimizer is a paged optimizer or not.

Base Lion optimizer.

Lion8bit

class bitsandbytes.optim.Lion8bit

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )

__init__

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )

Parameters

  • params (torch.tensor) — The input parameters to optimize.
  • lr (float, defaults to 1e-4) — The learning rate.
  • betas (tuple(float, float), defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer.
  • weight_decay (float, defaults to 0) — The weight decay value for the optimizer.
  • args (object, defaults to None) — An object with additional arguments.
  • min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.
  • percentile_clipping (int, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
  • block_wise (bool, defaults to True) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.
  • is_paged (bool, defaults to False) — Whether the optimizer is a paged optimizer or not.

8-bit Lion optimizer.

Lion32bit

class bitsandbytes.optim.Lion32bit

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )

__init__

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )

Parameters

  • params (torch.tensor) — The input parameters to optimize.
  • lr (float, defaults to 1e-4) — The learning rate.
  • betas (tuple(float, float), defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer.
  • weight_decay (float, defaults to 0) — The weight decay value for the optimizer.
  • args (object, defaults to None) — An object with additional arguments.
  • min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.
  • percentile_clipping (int, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
  • block_wise (bool, defaults to True) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.
  • is_paged (bool, defaults to False) — Whether the optimizer is a paged optimizer or not.

32-bit Lion optimizer.

PagedLion

class bitsandbytes.optim.PagedLion

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 optim_bits = 32 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )

__init__

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 optim_bits = 32 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )

Parameters

  • params (torch.tensor) — The input parameters to optimize.
  • lr (float, defaults to 1e-4) — The learning rate.
  • betas (tuple(float, float), defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer.
  • weight_decay (float, defaults to 0) — The weight decay value for the optimizer.
  • optim_bits (int, defaults to 32) — The number of bits of the optimizer state.
  • args (object, defaults to None) — An object with additional arguments.
  • min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.
  • percentile_clipping (int, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
  • block_wise (bool, defaults to True) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.

Paged Lion optimizer.

PagedLion8bit

class bitsandbytes.optim.PagedLion8bit

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )

__init__

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )

Parameters

  • params (torch.tensor) — The input parameters to optimize.
  • lr (float, defaults to 1e-4) — The learning rate.
  • betas (tuple(float, float), defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer.
  • weight_decay (float, defaults to 0) — The weight decay value for the optimizer.
  • optim_bits (int, defaults to 32) — The number of bits of the optimizer state.
  • args (object, defaults to None) — An object with additional arguments.
  • min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.
  • percentile_clipping (int, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
  • block_wise (bool, defaults to True) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.

Paged 8-bit Lion optimizer.

PagedLion32bit

class bitsandbytes.optim.PagedLion32bit

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )

__init__

< >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )

Parameters

  • params (torch.tensor) — The input parameters to optimize.
  • lr (float, defaults to 1e-4) — The learning rate.
  • betas (tuple(float, float), defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer.
  • weight_decay (float, defaults to 0) — The weight decay value for the optimizer.
  • optim_bits (int, defaults to 32) — The number of bits of the optimizer state.
  • args (object, defaults to None) — An object with additional arguments.
  • min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.
  • percentile_clipping (int, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
  • block_wise (bool, defaults to True) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.

Paged 32-bit Lion optimizer.

< > Update on GitHub