Lion

Lion (Evolved Sign Momentum) is a unique optimizer that uses the sign of the gradient to determine the update direction of the momentum. This makes Lion more memory-efficient and faster than AdamW which tracks and store the first and second-order moments.

Lion

class bitsandbytes.optim.Lion

< source >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 optim_bits = 32 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )

init

< source >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 optim_bits = 32 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )

Parameters

params (torch.tensor) — The input parameters to optimize.
lr (float, defaults to 1e-4) — The learning rate.
betas (tuple(float, float), defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer.
weight_decay (float, defaults to 0) — The weight decay value for the optimizer.
optim_bits (int, defaults to 32) — The number of bits of the optimizer state.
args (dict, defaults to None) — A dictionary with additional arguments.
min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.
percentile_clipping (int, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
block_wise (bool, defaults to True) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.
is_paged (bool, defaults to False) — Whether the optimizer is a paged optimizer or not.

Base Lion optimizer.

Lion8bit

class bitsandbytes.optim.Lion8bit

< source >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )

init

< source >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )

Parameters

params (torch.tensor) — The input parameters to optimize.
lr (float, defaults to 1e-4) — The learning rate.
betas (tuple(float, float), defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer.
weight_decay (float, defaults to 0) — The weight decay value for the optimizer.
args (dict, defaults to None) — A dictionary with additional arguments.
min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.
percentile_clipping (int, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
block_wise (bool, defaults to True) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.
is_paged (bool, defaults to False) — Whether the optimizer is a paged optimizer or not.

8-bit Lion optimizer.

Lion32bit

class bitsandbytes.optim.Lion32bit

< source >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )

init

< source >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )

Parameters

params (torch.tensor) — The input parameters to optimize.
lr (float, defaults to 1e-4) — The learning rate.
betas (tuple(float, float), defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer.
weight_decay (float, defaults to 0) — The weight decay value for the optimizer.
args (dict, defaults to None) — A dictionary with additional arguments.
min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.
percentile_clipping (int, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
block_wise (bool, defaults to True) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.
is_paged (bool, defaults to False) — Whether the optimizer is a paged optimizer or not.

32-bit Lion optimizer.

PagedLion

class bitsandbytes.optim.PagedLion

< source >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 optim_bits = 32 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )

init

< source >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 optim_bits = 32 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )

Parameters

params (torch.tensor) — The input parameters to optimize.
lr (float, defaults to 1e-4) — The learning rate.
betas (tuple(float, float), defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer.
weight_decay (float, defaults to 0) — The weight decay value for the optimizer.
optim_bits (int, defaults to 32) — The number of bits of the optimizer state.
args (dict, defaults to None) — A dictionary with additional arguments.
min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.
percentile_clipping (int, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
block_wise (bool, defaults to True) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.

Paged Lion optimizer.

PagedLion8bit

class bitsandbytes.optim.PagedLion8bit

< source >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )

init

< source >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )

Parameters

params (torch.tensor) — The input parameters to optimize.
lr (float, defaults to 1e-4) — The learning rate.
betas (tuple(float, float), defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer.
weight_decay (float, defaults to 0) — The weight decay value for the optimizer.
optim_bits (int, defaults to 32) — The number of bits of the optimizer state.
args (dict, defaults to None) — A dictionary with additional arguments.
min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.
percentile_clipping (int, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
block_wise (bool, defaults to True) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.

Paged 8-bit Lion optimizer.

PagedLion32bit

class bitsandbytes.optim.PagedLion32bit

< source >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )

init

< source >

( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )

Parameters

params (torch.tensor) — The input parameters to optimize.
lr (float, defaults to 1e-4) — The learning rate.
betas (tuple(float, float), defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer.
weight_decay (float, defaults to 0) — The weight decay value for the optimizer.
optim_bits (int, defaults to 32) — The number of bits of the optimizer state.
args (dict, defaults to None) — A dictionary with additional arguments.
min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.
percentile_clipping (int, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
block_wise (bool, defaults to True) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.

Paged 32-bit Lion optimizer.

Bitsandbytes

Lion

Lion

class bitsandbytes.optim.Lion

__init__

Lion8bit

class bitsandbytes.optim.Lion8bit

__init__

Lion32bit

class bitsandbytes.optim.Lion32bit

__init__

PagedLion

class bitsandbytes.optim.PagedLion

__init__

PagedLion8bit

class bitsandbytes.optim.PagedLion8bit

__init__

PagedLion32bit

class bitsandbytes.optim.PagedLion32bit

__init__

init

init

init

init

init

init