Accelerate documentation

Utilities for Megatron-LM

You are viewing v0.17.1 version. A newer version v1.2.1 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Utilities for Megatron-LM

class accelerate.utils.MegatronLMPlugin

< >

( tp_degree: int = None pp_degree: int = None num_micro_batches: int = None gradient_clipping: float = None sequence_parallelism: bool = None recompute_activation: bool = None use_distributed_optimizer: bool = None pipeline_model_parallel_split_rank: int = None num_layers_per_virtual_pipeline_stage: int = None is_train_batch_min: str = True train_iters: int = None train_samples: int = None weight_decay_incr_style: str = 'constant' start_weight_decay: float = None end_weight_decay: float = None lr_decay_style: str = 'linear' lr_decay_iters: int = None lr_decay_samples: int = None lr_warmup_iters: int = None lr_warmup_samples: int = None lr_warmup_fraction: float = None min_lr: float = 0 consumed_samples: typing.List[int] = None no_wd_decay_cond: typing.Optional[typing.Callable] = None scale_lr_cond: typing.Optional[typing.Callable] = None lr_mult: float = 1.0 megatron_dataset_flag: bool = False seq_length: int = None encoder_seq_length: int = None decoder_seq_length: int = None tensorboard_dir: str = None set_all_logging_options: bool = False eval_iters: int = 100 eval_interval: int = 1000 return_logits: bool = False custom_train_step_class: typing.Optional[typing.Any] = None custom_train_step_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = None custom_model_provider_function: typing.Optional[typing.Callable] = None custom_prepare_model_function: typing.Optional[typing.Callable] = None other_megatron_args: typing.Union[typing.Dict[str, typing.Any], NoneType] = None )

Plugin for Megatron-LM to enable tensor, pipeline, sequence and data parallelism. Also to enable selective activation recomputation and optimized fused kernels.

class accelerate.utils.MegatronLMDummyScheduler

< >

( optimizer total_num_steps = None warmup_num_steps = 0 **kwargs )

Parameters

  • optimizer (torch.optim.optimizer.Optimizer) — The optimizer to wrap.
  • total_num_steps (int) — Total number of steps.
  • warmup_num_steps (int) — Number of steps for warmup. **kwargs — Other arguments.

Dummy scheduler presents model parameters or param groups, this is primarily used to follow conventional training loop when scheduler config is specified in the deepspeed config file.

class accelerate.utils.MegatronLMDummyDataLoader

< >

( **dataset_kwargs )

Dummy dataloader presents model parameters or param groups, this is primarily used to follow conventional training

class accelerate.utils.AbstractTrainStep

< >

( name )

Abstract class for batching, forward pass and loss handler.

class accelerate.utils.GPTTrainStep

< >

( args )

Parameters

  • args (argparse.Namespace) — Megatron-LM arguments.

GPT train step class.

class accelerate.utils.BertTrainStep

< >

( args )

Parameters

  • args (argparse.Namespace) — Megatron-LM arguments.

Bert train step class.

class accelerate.utils.T5TrainStep

< >

( args )

Parameters

  • args (argparse.Namespace) — Megatron-LM arguments.

T5 train step class.

accelerate.utils.avg_losses_across_data_parallel_group

< >

( losses )

Parameters

  • losses (List[Tensor]) — List of losses to average across data parallel group.

Average losses across data parallel group.