The Accelerator is the main class provided by πŸ€— Accelerate. It serves at the main entrypoint for the API. To quickly adapt your script to work on any kind of setup with πŸ€— Accelerate juste:

  1. Initialize an Accelerator object (that we will call accelerator in the rest of this page) as early as possible in your script.

  2. Pass along your model(s), optimizer(s), dataloader(s) to the prepare() method.

  3. (Optional but best practice) Remove all the cuda() or to(device) in your code and let the accelerator handle device placement for you.

  4. Replace the loss.backward() in your code by accelerator.backward(loss).

  5. (Optional, when using distributed evaluation) Gather your predictions and labelsbefore storing them or using them for metric computation using gather().

This is all what is needed in most cases. For more advanced case or a nicer experience here are the functions you should search for and replace by the corresponding methods of your accelerator:

  • print statements should be replaced by print() to be only printed once per process.

  • Use is_local_main_process() for statements that should be executed once per server.

  • Use is_main_process() for statements that should be executed once only.

  • Use wait_for_everyone() to make sure all processes join that point before continuing (useful before a model save for instance).

  • Use unwrap_model() to unwrap your model before saving it.

  • Use save() instead of

  • Use clip_grad_norm_() instead of torch.nn.utils.clip_grad_norm_ and clip_grad_value_() instead of torch.nn.utils.clip_grad_value_.