Kwargs Handlers

The following objects can be passed to the main Accelerator to customize how some PyTorch objects related to distributed training or mixed precision are created.

DistributedDataParallelKwargs

class accelerate.DistributedDataParallelKwargs(dim: int = 0, broadcast_buffers: bool = True, bucket_cap_mb: int = 25, find_unused_parameters: bool = False, check_reduction: bool = False, gradient_as_bucket_view: bool = False)[source]

Use this object in your Accelerator to customize how your model is wrapped in a torch.nn.parallel.DistributedDataParallel. Please refer to the documentation of this wrapper for more information on each argument.

Warning

gradient_as_bucket_view is only available in PyTorch 1.7.0 and later versions.

GradScalerKwargs

class accelerate.GradScalerKwargs(init_scale: float = 65536.0, growth_factor: float = 2.0, backoff_factor: float = 0.5, growth_interval: int = 2000, enabled: bool = True)[source]

Use this object in your Accelerator to customize the behavior of mixed precision, specifically how the torch.cuda.amp.GradScaler used is created. Please refer to the documentation of this scaler for more information on each argument.

Warning

GradScaler is only available in PyTorch 1.5.0 and later versions.