DeepSpeed utilities

DeepSpeedPlugin

get_active_deepspeed_plugin

accelerate.utils.get_active_deepspeed_plugin

< source >

( state )

Raises

ValueError

ValueError — If DeepSpeed was not enabled and this function is called.

Returns the currently active DeepSpeedPlugin.

class accelerate.DeepSpeedPlugin

< source >

( hf_ds_config: typing.Any = None gradient_accumulation_steps: int = None gradient_clipping: float = None zero_stage: int = None is_train_batch_min: bool = True offload_optimizer_device: str = None offload_param_device: str = None offload_optimizer_nvme_path: str = None offload_param_nvme_path: str = None zero3_init_flag: bool = None zero3_save_16bit_model: bool = None transformer_moe_cls_names: str = None enable_msamp: bool = None msamp_opt_level: typing.Optional[typing.Literal['O1', 'O2']] = None )

Parameters

hf_ds_config (Any, defaults to None) — Path to DeepSpeed config file or dict or an object of class accelerate.utils.deepspeed.HfDeepSpeedConfig.
gradient_accumulation_steps (int, defaults to None) — Number of steps to accumulate gradients before updating optimizer states. If not set, will use the value from the Accelerator directly.
gradient_clipping (float, defaults to None) — Enable gradient clipping with value.
zero_stage (int, defaults to None) — Possible options are 0, 1, 2, 3. Default will be taken from environment variable.
is_train_batch_min (bool, defaults to True) — If both train & eval dataloaders are specified, this will decide the train_batch_size.
offload_optimizer_device (str, defaults to None) — Possible options are none|cpu|nvme. Only applicable with ZeRO Stages 2 and 3.
offload_param_device (str, defaults to None) — Possible options are none|cpu|nvme. Only applicable with ZeRO Stage 3.
offload_optimizer_nvme_path (str, defaults to None) — Possible options are /nvme|/local_nvme. Only applicable with ZeRO Stage 3.
offload_param_nvme_path (str, defaults to None) — Possible options are /nvme|/local_nvme. Only applicable with ZeRO Stage 3.
zero3_init_flag (bool, defaults to None) — Flag to indicate whether to save 16-bit model. Only applicable with ZeRO Stage-3.
zero3_save_16bit_model (bool, defaults to None) — Flag to indicate whether to save 16-bit model. Only applicable with ZeRO Stage-3.
transformer_moe_cls_names (str, defaults to None) — Comma-separated list of Transformers MoE layer class names (case-sensitive). For example, MixtralSparseMoeBlock, Qwen2MoeSparseMoeBlock, JetMoEAttention, JetMoEBlock, etc.
enable_msamp (bool, defaults to None) — Flag to indicate whether to enable MS-AMP backend for FP8 training.
msasmp_opt_level (Optional[Literal["O1", "O2"]], defaults to None) — Optimization level for MS-AMP (defaults to ‘O1’). Only applicable if enable_msamp is True. Should be one of [‘O1’ or ‘O2’].

This plugin is used to integrate DeepSpeed.

deepspeed_config_process

< source >

( prefix = '' mismatches = None config = None must_match = True **kwargs )

Process the DeepSpeed config with the values from the kwargs.

select

< source >

( _from_accelerator_state: bool = False )

Sets the HfDeepSpeedWeakref to use the current deepspeed plugin configuration

class accelerate.utils.DummyScheduler

< source >

( optimizer total_num_steps = None warmup_num_steps = 0 lr_scheduler_callable = None **kwargs )

Parameters

optimizer (torch.optim.optimizer.Optimizer) — The optimizer to wrap.
total_num_steps (int, optional) — Total number of steps.
warmup_num_steps (int, optional) — Number of steps for warmup.
lr_scheduler_callable (callable, optional) — A callable function that creates an LR Scheduler. It accepts only one argument optimizer.
**kwargs (additional keyword arguments, optional) — Other arguments.

Dummy scheduler presents model parameters or param groups, this is primarily used to follow conventional training loop when scheduler config is specified in the deepspeed config file.

DeepSpeedEnginerWrapper

class accelerate.utils.DeepSpeedEngineWrapper

< source >

( engine )

Parameters

engine (deepspeed.runtime.engine.DeepSpeedEngine) — deepspeed engine to wrap

Internal wrapper for deepspeed.runtime.engine.DeepSpeedEngine. This is used to follow conventional training loop.

get_global_grad_norm

< source >

( )

Get the global gradient norm from DeepSpeed engine.

DeepSpeedOptimizerWrapper

class accelerate.utils.DeepSpeedOptimizerWrapper

< source >

( optimizer )

Parameters

optimizer (torch.optim.optimizer.Optimizer) — The optimizer to wrap.

Internal wrapper around a deepspeed optimizer.

DeepSpeedSchedulerWrapper

class accelerate.utils.DeepSpeedSchedulerWrapper

< source >

( scheduler optimizers )

Parameters

scheduler (torch.optim.lr_scheduler.LambdaLR) — The scheduler to wrap.
optimizers (one or a list of torch.optim.Optimizer) —

Internal wrapper around a deepspeed scheduler.

DummyOptim

class accelerate.utils.DummyOptim

< source >

( params lr = 0.001 weight_decay = 0 **kwargs )

Parameters

lr (float) — Learning rate.
params (iterable) — iterable of parameters to optimize or dicts defining parameter groups
weight_decay (float) — Weight decay.
**kwargs (additional keyword arguments, optional) — Other arguments.

Dummy optimizer presents model parameters or param groups, this is primarily used to follow conventional training loop when optimizer config is specified in the deepspeed config file.

DummyScheduler

Update on GitHub

Accelerate

DeepSpeed utilities

DeepSpeedPlugin

get_active_deepspeed_plugin

accelerate.utils.get_active_deepspeed_plugin

class accelerate.DeepSpeedPlugin

deepspeed_config_process

select

class accelerate.utils.DummyScheduler

DeepSpeedEnginerWrapper

class accelerate.utils.DeepSpeedEngineWrapper

get_global_grad_norm

DeepSpeedOptimizerWrapper

class accelerate.utils.DeepSpeedOptimizerWrapper

DeepSpeedSchedulerWrapper

class accelerate.utils.DeepSpeedSchedulerWrapper

DummyOptim

class accelerate.utils.DummyOptim

DummyScheduler