Utilities for Megatron-LM
class accelerate.utils.MegatronLMPlugin
< source >( tp_degree: int = None pp_degree: int = None num_micro_batches: int = None gradient_clipping: float = None sequence_parallelism: bool = None recompute_activation: bool = None use_distributed_optimizer: bool = None pipeline_model_parallel_split_rank: int = None num_layers_per_virtual_pipeline_stage: int = None is_train_batch_min: str = True train_iters: int = None train_samples: int = None weight_decay_incr_style: str = 'constant' start_weight_decay: float = None end_weight_decay: float = None lr_decay_style: str = 'linear' lr_decay_iters: int = None lr_decay_samples: int = None lr_warmup_iters: int = None lr_warmup_samples: int = None lr_warmup_fraction: float = None min_lr: float = 0 consumed_samples: typing.List[int] = None no_wd_decay_cond: typing.Optional[typing.Callable] = None scale_lr_cond: typing.Optional[typing.Callable] = None lr_mult: float = 1.0 megatron_dataset_flag: bool = False seq_length: int = None encoder_seq_length: int = None decoder_seq_length: int = None tensorboard_dir: str = None set_all_logging_options: bool = False eval_iters: int = 100 eval_interval: int = 1000 return_logits: bool = False custom_train_step_class: typing.Optional[typing.Any] = None custom_train_step_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = None custom_model_provider_function: typing.Optional[typing.Callable] = None custom_prepare_model_function: typing.Optional[typing.Callable] = None other_megatron_args: typing.Union[typing.Dict[str, typing.Any], NoneType] = None )
Plugin for Megatron-LM to enable tensor, pipeline, sequence and data parallelism. Also to enable selective activation recomputation and optimized fused kernels.
class accelerate.utils.MegatronLMDummyScheduler
< source >( optimizer total_num_steps = None warmup_num_steps = 0 **kwargs )
Dummy scheduler presents model parameters or param groups, this is primarily used to follow conventional training loop when scheduler config is specified in the deepspeed config file.
Dummy dataloader presents model parameters or param groups, this is primarily used to follow conventional training
Abstract class for batching, forward pass and loss handler.
class accelerate.utils.GPTTrainStep
< source >( args )
GPT train step class.
class accelerate.utils.BertTrainStep
< source >( args )
Bert train step class.
class accelerate.utils.T5TrainStep
< source >( args )
T5 train step class.
accelerate.utils.avg_losses_across_data_parallel_group
< source >( losses )
Average losses across data parallel group.