Accelerator

The Accelerator is the main class provided by 🤗 Accelerate. It serves at the main entrypoint for the API.

Quick adaptation of your code

To quickly adapt your script to work on any kind of setup with 🤗 Accelerate just:

Initialize an Accelerator object (that we will call accelerator throughout this page) as early as possible in your script.
Pass your dataloader(s), model(s), optimizer(s), and scheduler(s) to the prepare() method.
Remove all the .cuda() or .to(device) from your code and let the accelerator handle the device placement for you.

Step three is optional, but considered a best practice.

Replace loss.backward() in your code with accelerator.backward(loss)
Gather your predictions and labels before storing them or using them for metric computation using gather()

Step five is mandatory when using distributed evaluation

In most cases this is all that is needed. The next section lists a few more advanced use cases and nice features you should search for and replace by the corresponding methods of your accelerator:

Advanced recommendations

Printing

print statements should be replaced by print() to be printed once per process

- print("My thing I want to print!")
+ accelerator.print("My thing I want to print!")

Executing processes

Once on a single server

For statements that should be executed once per server, use is_local_main_process:

if accelerator.is_local_main_process:
    do_thing_once_per_server()

A function can be wrapped using the on_local_main_process() function to achieve the same behavior on a function’s execution:

@accelerator.on_local_main_process
def do_my_thing():
    "Something done once per server"
    do_thing_once_per_server()

Only ever once across all servers

For statements that should only ever be executed once, use is_main_process:

if accelerator.is_main_process:
    do_thing_once()

A function can be wrapped using the on_main_process() function to achieve the same behavior on a function’s execution:

@accelerator.on_main_process
def do_my_thing():
    "Something done once per server"
    do_thing_once()

On specific processes

If a function should be ran on a specific overall or local process index, there are similar decorators to achieve this:

@accelerator.on_local_process(local_process_idx=0)
def do_my_thing():
    "Something done on process index 0 on each server"
    do_thing_on_index_zero_on_each_server()

@accelerator.on_process(process_index=0)
def do_my_thing():
    "Something done on process index 0"
    do_thing_on_index_zero()

Synchronicity control

Use wait_for_everyone() to make sure all processes join that point before continuing. (Useful before a model save for instance)

Saving and loading

Use unwrap_model() before saving to remove all special model wrappers added during the distributed process.

model = MyModel()
model = accelerator.prepare(model)
# Unwrap
model = accelerator.unwrap_model(model)

Use save() instead of torch.save:

  state_dict = model.state_dict()
- torch.save(state_dict, "my_state.pkl")
+ accelerator.save(state_dict, "my_state.pkl")

Operations

Use clipgrad_norm() instead of torch.nn.utils.clip_grad_norm_ and clipgrad_value() instead of torch.nn.utils.clip_grad_value

Gradient Accumulation

To perform gradient accumulation use accumulate() and specify a gradient_accumulation_steps. This will also automatically ensure the gradients are synced or unsynced when on multi-device training, check if the step should actually be performed, and auto-scale the loss:

- accelerator = Accelerator()
+ accelerator = Accelerator(gradient_accumulation_steps=2)

  for (input, label) in training_dataloader:
+     with accelerator.accumulate(model):
          predictions = model(input)
          loss = loss_function(predictions, labels)
          accelerator.backward(loss)
          optimizer.step()
          scheduler.step()
          optimizer.zero_grad()

Accelerate

Accelerator

Quick adaptation of your code

Advanced recommendations

Printing

Executing processes

Once on a single server

Only ever once across all servers

On specific processes

Synchronicity control

Saving and loading

Operations

Gradient Accumulation

Overall API documentation:

class accelerate.Accelerator

accumulate

autocast

backward

clear

clip_grad_norm_

clip_grad_value_

end_training

free_memory

gather

gather_for_metrics

get_tracker

init_trackers

load_state

local_main_process_first

log

main_process_first

no_sync

on_local_main_process

on_local_process

on_main_process

on_process

pad_across_processes

prepare

print

reduce

register_for_checkpointing

save

save_state

unscale_gradients

unwrap_model

wait_for_everyone