Accelerate documentation

Accelerator

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Accelerator

The Accelerator is the main class provided by πŸ€— Accelerate. It serves at the main entrypoint for the API. To quickly adapt your script to work on any kind of setup with πŸ€— Accelerate juste:

  1. Initialize an Accelerator object (that we will call accelerator in the rest of this page) as early as possible in your script.
  2. Pass along your model(s), optimizer(s), dataloader(s) to the prepare() method.
  3. (Optional but best practice) Remove all the .cuda() or .to(device) in your code and let the accelerator handle device placement for you.
  4. Replace the loss.backward() in your code by accelerator.backward(loss).
  5. (Optional, when using distributed evaluation) Gather your predictions and labelsbefore storing them or using them for metric computation using gather().

This is all what is needed in most cases. For more advanced case or a nicer experience here are the functions you should search for and replace by the corresponding methods of your accelerator:

  • print statements should be replaced by print() to be only printed once per process.
  • Use is_local_main_process() for statements that should be executed once per server.
  • Use is_main_process() for statements that should be executed once only.
  • Use wait_for_everyone() to make sure all processes join that point before continuing (useful before a model save for instance).
  • Use unwrap_model() to unwrap your model before saving it.
  • Use save() instead of torch.save.
  • Use clipgrad_norm() instead of torch.nn.utils.clip_grad_norm_ and clipgrad_value() instead of torch.nn.utils.clip_grad_value_.

class accelerate.Accelerator

< >

( device_placement: bool = True split_batches: bool = False fp16: bool = None mixed_precision: typing.Union[accelerate.utils.dataclasses.PrecisionType, str] = None cpu: bool = False deepspeed_plugin: DeepSpeedPlugin = None fsdp_plugin: FullyShardedDataParallelPlugin = None rng_types: typing.Union[typing.List[typing.Union[str, accelerate.utils.dataclasses.RNGType]], NoneType] = None log_with: typing.Union[typing.List[typing.Union[str, accelerate.utils.dataclasses.LoggerType, accelerate.tracking.GeneralTracker]], NoneType] = None logging_dir: typing.Union[str, os.PathLike, NoneType] = None dispatch_batches: typing.Optional[bool] = None step_scheduler_with_optimizer: bool = True kwargs_handlers: typing.Optional[typing.List[accelerate.utils.dataclasses.KwargsHandler]] = None )

Parameters

  • device_placement (bool, optional, defaults to True) — Whether or not the accelerator should put objects on device (tensors yielded by the dataloader, model, etc…).
  • split_batches (bool, optional, defaults to False) — Whether or not the accelerator should split the batches yielded by the dataloaders across the devices. If True the actual batch size used will be the same on any kind of distributed processes, but it must be a round multiple of the num_processes you are using. If False, actual batch size used will be the one set in your script multiplied by the number of processes.
  • mixed_precision (str, optional) — Whether or not to use mixed precision training (fp16 or bfloat16). Choose from ‘no’,‘fp16’,‘bf16’. Will default to the value in the environment variable MIXED_PRECISION, which will use the default value in the accelerate config of the current system or the flag passed with the accelerate.launch command. ‘fp16’ requires pytorch 1.6 or higher. ‘bf16’ requires pytorch 1.10 or higher.
  • cpu (bool, optional) — Whether or not to force the script to execute on CPU. Will ignore GPU available if set to True and force the execution on one process only.
  • deepspeed_plugin (DeepSpeedPlugin, optional) — Tweak your DeepSpeed related args using this argument. This argument is optional and can be configured directly using accelerate config
  • fsdp_plugin (FullyShardedDataParallelPlugin, optional) — Tweak your FSDP related args using this argument. This argument is optional and can be configured directly using accelerate config
  • rng_types (list of str or RNGType) — The list of random number generators to synchronize at the beginning of each iteration in your prepared dataloaders. Should be one or several of:

    • "torch": the base torch random number generator
    • "cuda": the CUDA random number generator (GPU only)
    • "xla": the XLA random number generator (TPU only)
    • "generator": the torch.Generator of the sampler (or batch sampler if there is no sampler in your dataloader) or of the iterable dataset (if it exists) if the underlying dataset is of that type.

    Will default to ["torch"] for PyTorch versions <=1.5.1 and ["generator"] for PyTorch versions >= 1.6.

  • log_with (list of str, LoggerType or GeneralTracker, optional) — A list of loggers to be setup for experiment tracking. Should be one or several of:

    • "all"
    • "tensorboard"
    • "wandb"
    • "comet_ml" If "all” is selected, will pick up all available trackers in the environment and intialize them. Can also accept implementations of GeneralTracker for custom trackers, and can be combined with "all".
  • logging_dir (str, os.PathLike, optional) — A path to a directory for storing logs of locally-compatible loggers.
  • dispatch_batches (bool, optional) — If set to True, the dataloader prepared by the Accelerator is only iterated through on the main process and then the batches are split and broadcast to each process. Will default to True for DataLoader whose underlying dataset is an IterableDataset, False otherwise.
  • step_scheduler_with_optimizer (bool, *optional, defaults to True) -- Set Trueif the learning rate scheduler is stepped at the same time as the optimizer,False` if only done under certain circumstances (at the end of each epoch, for instance).
  • kwargs_handlers (List[KwargHandler], optional) — A list of KwargHandler to customize how the objects related to distributed training or mixed precision are created. See kwargs for more information.

Creates an instance of an accelerator for distributed training (on multi-GPU, TPU) or mixed precision training.

Attributes

  • device (torch.device) β€” The device to use.
  • state (AcceleratorState) β€” The distributed setup state.

autocast

< >

( )

Will apply automatic mixed-precision inside the block inside this context manager, if it is enabled. Nothing different will happen otherwise.

backward

< >

( loss **kwargs )

Use accelerator.backward(loss) in lieu of loss.backward().

clear

< >

( )

Alias for Accelerate.free_memory, releases all references to the internal objects stored and call the garbage collector. You should call this method between two trainings with different models/optimizers.

clip_grad_norm_

< >

( parameters max_norm norm_type = 2 )

Should be used in place of torch.nn.utils.clip_grad_norm_.

clip_grad_value_

< >

( parameters clip_value )

Should be used in place of torch.nn.utils.clip_grad_value_.

end_training

< >

( )

Runs any special end training behaviors, such as stopping trackers

free_memory

< >

( )

Will release all references to the internal objects stored and call the garbage collector. You should call this method between two trainings with different models/optimizers.

gather

< >

( tensor ) β†’ torch.Tensor, or a nested tuple/list/dictionary of torch.Tensor

Parameters

  • tensor (torch.Tensor, or a nested tuple/list/dictionary of torch.Tensor) — The tensors to gather across all processes.

Returns

torch.Tensor, or a nested tuple/list/dictionary of torch.Tensor

The gathered tensor(s). Note that the first dimension of the result is num_processes multiplied by the first dimension of the input tensors.

Gather the values in tensor across all processes and concatenate them on the first dimension. Useful to regroup the predictions from all processes when doing evaluation.

Note: This gather happens in all processes.

init_trackers

< >

( project_name: str config: typing.Optional[dict] = None )

Parameters

  • project_name (str) — The name of the project. All trackers will save their data based on this
  • config (dict, optional) — Optional starting configuration to be logged.

Initializes a run for all trackers stored in self.log_with, potentially with starting configurations

load_state

< >

( input_dir: str )

Parameters

  • input_dir (str or os.PathLike) — The name of the folder all relevant weights and states were saved in.

Loads the current states of the model, optimizer, scaler, RNG generators, and registered objects.

local_main_process_first

< >

( )

Lets the local main process go inside a with block.

The other processes will enter the with block after the main process exits.

log

< >

( values: dict step: typing.Optional[int] = None )

Parameters

  • values (dict) — Values should be a dictionary-like object containing only types int, float, or str.
  • step (int, optional) — The run step. If included, the log will be affiliated with this step.

Logs values to all stored trackers in self.trackers.

main_process_first

< >

( )

Lets the main process go first inside a with block.

The other processes will enter the with block after the main process exits.

no_sync

< >

( model )

Parameters

  • model (torch.nn.Module) — PyTorch Module that was prepared with Accelerator.prepare

A context manager to disable gradient synchronizations across DDP processes by calling torch.nn.parallel.DistributedDataParallel.no_sync.

If model is not in DDP, this context manager does nothing

pad_across_processes

< >

( tensor dim = 0 pad_index = 0 pad_first = False )

Parameters

  • tensor (nested list/tuple/dictionary of torch.Tensor) — The data to gather.
  • dim (int, optional, defaults to 0) — The dimension on which to pad.
  • pad_index (int, optional, defaults to 0) — The value with which to pad.
  • pad_first (bool, optional, defaults to False) — Whether to pad at the beginning or the end.

Recursively pad the tensors in a nested list/tuple/dictionary of tensors from all devices to the same size so they can safely be gathered.

prepare

< >

( *args )

Prepare all objects passed in args for distributed training and mixed precision, then return them in the same order.

Accepts the following type of objects:

  • torch.utils.data.DataLoader: PyTorch Dataloader
  • torch.nn.Module: PyTorch Module
  • torch.optim.Optimizer: PyTorch Optimizer

print

< >

( *args **kwargs )

Use in replacement of print() to only print once per server.

reduce

< >

( tensor: Tensor reduction = 'sum' )

Parameters

  • tensor (torch.Tensor) — The tensors to reduce across all processes.
  • reduction (str, optional, defaults to “sum”) — A reduction type, can be one of ‘sum’, ‘mean’, or ‘none’. If ‘none’, will not perform any operation.

Reduce the values in tensor across all processes based on reduction.

register_for_checkpointing

< >

( *objects )

Makes note of objects and will save or load them in during save_state or load_state.

These should be utilized when the state is being loaded or saved in the same script. It is not designed to be used in different scripts

Every object must have a load_state_dict and state_dict function to be stored.

save

< >

( obj f )

Parameters

  • f (str or os.PathLike) — Where to save the content of obj.

Save the object passed to disk once per machine. Use in place of torch.save.

save_state

< >

( output_dir: str )

Parameters

  • output_dir (str or os.PathLike) — The name of the folder to save all relevant weights and states.

Saves the current states of the model, optimizer, scaler, RNG generators, and registered objects.

unscale_gradients

< >

( optimizer = None )

Parameters

  • optimizer (torch.optim.Optimizer or List[torch.optim.Optimizer], optional) — The optimizer(s) for which to unscale gradients. If not set, will unscale gradients on all optimizers that were passed to prepare().

Unscale the gradients in mixed precision training with AMP. This is a noop in all other settings.

unwrap_model

< >

( model )

Parameters

  • model (torch.nn.Module) — The model to unwrap.

Unwraps the model from the additional layer possible added by prepare(). Useful before saving the model.

wait_for_everyone

< >

( )

Will stop the execution of the current process until every other process has reached that point (so this does nothing when the script is only run in one process). Useful to do before saving a model.