Kwargs Handlers
The following objects can be passed to the main Accelerator to customize how some PyTorch objects related to distributed training or mixed precision are created.
DistributedDataParallelKwargs
class accelerate.DistributedDataParallelKwargs
< source >( dim: int = 0 broadcast_buffers: bool = True bucket_cap_mb: int = 25 find_unused_parameters: bool = False check_reduction: bool = False gradient_as_bucket_view: bool = False static_graph: bool = False )
Use this object in your Accelerator to customize how your model is wrapped in a
torch.nn.parallel.DistributedDataParallel
. Please refer to the documentation of this
wrapper for more
information on each argument.
gradient_as_bucket_view
is only available in PyTorch 1.7.0 and later versions.
static_graph
is only available in PyTorch 1.11.0 and later versions.
GradScalerKwargs
class accelerate.GradScalerKwargs
< source >( init_scale: float = 65536.0 growth_factor: float = 2.0 backoff_factor: float = 0.5 growth_interval: int = 2000 enabled: bool = True )
Use this object in your Accelerator to customize the behavior of mixed precision, specifically how the
torch.cuda.amp.GradScaler
used is created. Please refer to the documentation of this
scaler for more information on each argument.
GradScaler
is only available in PyTorch 1.5.0 and later versions.
InitProcessGroupKwargs
class accelerate.InitProcessGroupKwargs
< source >( backend: typing.Optional[str] = 'nccl' init_method: typing.Optional[str] = None timeout: timedelta = datetime.timedelta(seconds=1800) )
Use this object in your Accelerator to customize the initialization of the distributed processes. Please refer to the documentation of this method for more information on each argument.