Lion
Lion (Evolved Sign Momentum) is a unique optimizer that uses the sign of the gradient to determine the update direction of the momentum. This makes Lion more memory-efficient and faster than AdamW
which tracks and store the first and second-order moments.
Lion
class bitsandbytes.optim.Lion
< source >( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 optim_bits = 32 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )
__init__
< source >( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 optim_bits = 32 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )
Parameters
- params (
torch.tensor
) — The input parameters to optimize. - lr (
float
, defaults to 1e-4) — The learning rate. - betas (
tuple(float, float)
, defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer. - weight_decay (
float
, defaults to 0) — The weight decay value for the optimizer. - optim_bits (
int
, defaults to 32) — The number of bits of the optimizer state. - args (
object
, defaults toNone
) — An object with additional arguments. - min_8bit_size (
int
, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization. - percentile_clipping (
int
, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability. - block_wise (
bool
, defaults toTrue
) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability. - is_paged (
bool
, defaults toFalse
) — Whether the optimizer is a paged optimizer or not.
Base Lion optimizer.
Lion8bit
class bitsandbytes.optim.Lion8bit
< source >( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )
__init__
< source >( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )
Parameters
- params (
torch.tensor
) — The input parameters to optimize. - lr (
float
, defaults to 1e-4) — The learning rate. - betas (
tuple(float, float)
, defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer. - weight_decay (
float
, defaults to 0) — The weight decay value for the optimizer. - args (
object
, defaults toNone
) — An object with additional arguments. - min_8bit_size (
int
, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization. - percentile_clipping (
int
, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability. - block_wise (
bool
, defaults toTrue
) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability. - is_paged (
bool
, defaults toFalse
) — Whether the optimizer is a paged optimizer or not.
8-bit Lion optimizer.
Lion32bit
class bitsandbytes.optim.Lion32bit
< source >( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )
__init__
< source >( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True is_paged = False )
Parameters
- params (
torch.tensor
) — The input parameters to optimize. - lr (
float
, defaults to 1e-4) — The learning rate. - betas (
tuple(float, float)
, defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer. - weight_decay (
float
, defaults to 0) — The weight decay value for the optimizer. - args (
object
, defaults toNone
) — An object with additional arguments. - min_8bit_size (
int
, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization. - percentile_clipping (
int
, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability. - block_wise (
bool
, defaults toTrue
) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability. - is_paged (
bool
, defaults toFalse
) — Whether the optimizer is a paged optimizer or not.
32-bit Lion optimizer.
PagedLion
class bitsandbytes.optim.PagedLion
< source >( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 optim_bits = 32 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )
__init__
< source >( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 optim_bits = 32 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )
Parameters
- params (
torch.tensor
) — The input parameters to optimize. - lr (
float
, defaults to 1e-4) — The learning rate. - betas (
tuple(float, float)
, defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer. - weight_decay (
float
, defaults to 0) — The weight decay value for the optimizer. - optim_bits (
int
, defaults to 32) — The number of bits of the optimizer state. - args (
object
, defaults toNone
) — An object with additional arguments. - min_8bit_size (
int
, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization. - percentile_clipping (
int
, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability. - block_wise (
bool
, defaults toTrue
) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.
Paged Lion optimizer.
PagedLion8bit
class bitsandbytes.optim.PagedLion8bit
< source >( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )
__init__
< source >( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )
Parameters
- params (
torch.tensor
) — The input parameters to optimize. - lr (
float
, defaults to 1e-4) — The learning rate. - betas (
tuple(float, float)
, defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer. - weight_decay (
float
, defaults to 0) — The weight decay value for the optimizer. - optim_bits (
int
, defaults to 32) — The number of bits of the optimizer state. - args (
object
, defaults toNone
) — An object with additional arguments. - min_8bit_size (
int
, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization. - percentile_clipping (
int
, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability. - block_wise (
bool
, defaults toTrue
) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.
Paged 8-bit Lion optimizer.
PagedLion32bit
class bitsandbytes.optim.PagedLion32bit
< source >( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )
__init__
< source >( params lr = 0.0001 betas = (0.9, 0.99) weight_decay = 0 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )
Parameters
- params (
torch.tensor
) — The input parameters to optimize. - lr (
float
, defaults to 1e-4) — The learning rate. - betas (
tuple(float, float)
, defaults to (0.9, 0.999)) — The beta values are the decay rates of the first and second-order moment of the optimizer. - weight_decay (
float
, defaults to 0) — The weight decay value for the optimizer. - optim_bits (
int
, defaults to 32) — The number of bits of the optimizer state. - args (
object
, defaults toNone
) — An object with additional arguments. - min_8bit_size (
int
, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization. - percentile_clipping (
int
, defaults to 100) — Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability. - block_wise (
bool
, defaults toTrue
) — Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.
Paged 32-bit Lion optimizer.