Optimization

This page contains the API reference documentation for learning rate optimizers included in timm.

Optimizers

Factory functions

timm.optim.create_optimizer_v2

( model_or_params: typing.Union[torch.nn.modules.module.Module, collections.abc.Iterable[torch.Tensor], collections.abc.Iterable[dict[str, typing.Any]], collections.abc.Iterable[tuple[str, torch.Tensor]]] opt: str = 'sgd' lr: typing.Optional[float] = None weight_decay: float = 0.0 momentum: float = 0.9 foreach: typing.Optional[bool] = None filter_bias_and_bn: bool = True fallback_list: typing.Collection[str] = () fallback_no_weight_decay: bool = False layer_decay: typing.Optional[float] = None layer_decay_min_scale: float = 0.0 layer_decay_no_opt_scale: typing.Optional[float] = None param_group_fn: typing.Optional[typing.Callable[[torch.nn.modules.module.Module], typing.Union[collections.abc.Iterable[torch.Tensor], collections.abc.Iterable[dict[str, typing.Any]], collections.abc.Iterable[tuple[str, torch.Tensor]]]]] = None **kwargs: typing.Any )

Parameters

model_or_params — A PyTorch model or an iterable of parameters/parameter groups. If a model is provided, parameters will be automatically extracted and grouped based on the other arguments.
opt — Name of the optimizer to create (e.g., ‘adam’, ‘adamw’, ‘sgd’). Use list_optimizers() to see available options.
lr — Learning rate. If None, will use the optimizer’s default.
weight_decay — Weight decay factor. Will be used to create param groups if model_or_params is a model.
momentum — Momentum factor for optimizers that support it. Only used if the chosen optimizer accepts a momentum parameter.
foreach — Enable/disable foreach (multi-tensor) implementation if available. If None, will use optimizer-specific defaults.
filter_bias_and_bn — If True, bias, norm layer parameters (all 1d params) will not have weight decay applied. Only used when model_or_params is a model and weight_decay > 0.
fallback_list — Collection of parameter name patterns to use fallback optimizer for hybrid optimizers (e.g., AdamW for Muon). Supports wildcard matching.
fallback_no_weight_decay — If True, params in model’s no_weight_decay() list will use fallback optimizer for hybrid optimizers (e.g., AdamW for Muon).
layer_decay — Optional layer-wise learning rate decay factor. If provided, learning rates will be scaled by layer_decay^(max_depth - layer_depth). Only used when model_or_params is a model.
param_group_fn — Optional function to create custom parameter groups. If provided, other parameter grouping options will be ignored.
**kwargs — Additional optimizer-specific arguments (e.g., betas for Adam).

Create an optimizer instance via timm registry.

Creates and configures an optimizer with appropriate parameter groups and settings. Supports automatic parameter group creation for weight decay and layer-wise learning rates, as well as custom parameter grouping.

Examples:

Basic usage with a model

optimizer = create_optimizer_v2(model, ‘adamw’, lr=1e-3)

SGD with momentum and weight decay

optimizer = create_optimizer_v2( … model, ‘sgd’, lr=0.1, momentum=0.9, weight_decay=1e-4 … )

Adam with layer-wise learning rate decay

optimizer = create_optimizer_v2( … model, ‘adam’, lr=1e-3, layer_decay=0.7 … )

Custom parameter groups

def group_fn(model): … return [ … {‘params’: model.backbone.parameters(), ‘lr’: 1e-4}, … {‘params’: model.head.parameters(), ‘lr’: 1e-3} … ] optimizer = create_optimizer_v2( … model, ‘sgd’, param_group_fn=group_fn … )

Note: Parameter group handling precedence:

If param_group_fn is provided, it will be used exclusively
If layer_decay is provided, layer-wise groups will be created
If weight_decay > 0 and filter_bias_and_bn is True, weight decay groups will be created
Otherwise, all parameters will be in a single group

timm

Optimization

Optimizers

Factory functions

timm.optim.create_optimizer_v2

Basic usage with a model

SGD with momentum and weight decay

Adam with layer-wise learning rate decay

Custom parameter groups

timm.optim.list_optimizers

timm.optim.get_optimizer_class

Get SGD with nesterov momentum default

Get raw optimizer class

Optimizer Classes

class timm.optim.AdaBelief

step

class timm.optim.Adafactor

step

class timm.optim.AdafactorBigVision

class timm.optim.Adahessian

get_params

set_hessian

step

zero_hessian

class timm.optim.AdamP

class timm.optim.Adan

step

class timm.optim.Adopt

step

class timm.optim.Lamb

step

class timm.optim.LaProp

step

class timm.optim.Lars

step

class timm.optim.Lion

step

class timm.optim.Lookahead

class timm.optim.MADGRAD

step

class timm.optim.Mars

step

class timm.optim.NAdamW

step

class timm.optim.NvNovoGrad

step

class timm.optim.RMSpropTF

step

class timm.optim.SGDP

class timm.optim.SGDW

step