Accelerate documentation

Accelerate

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Accelerate

Run your raw PyTorch training script on any kind of device

Features

  • πŸ€— Accelerate provides an easy API to make your scripts run with mixed precision and on any kind of distributed setting (multi-GPUs, TPUs etc.) while still letting you write your own training loop. The same code can then runs seamlessly on your local machine for debugging or your training environment.

  • πŸ€— Accelerate also provides a CLI tool that allows you to quickly configure and test your training environment then launch the scripts.

Easy to integrate

A traditional training loop in PyTorch looks like this:

my_model.to(device)

for batch in my_training_dataloader:
    my_optimizer.zero_grad()
    inputs, targets = batch
    inputs = inputs.to(device)
    targets = targets.to(device)
    outputs = my_model(inputs)
    loss = my_loss_function(outputs, targets)
    loss.backward()
    my_optimizer.step()

Changing it to work with accelerate is really easy and only adds a few lines of code:

+ from accelerate import Accelerator

+ accelerator = Accelerator()
  # Use the device given by the *accelerator* object.
+ device = accelerator.device
  my_model.to(device)
  # Pass every important object (model, optimizer, dataloader) to *accelerator.prepare*
+ my_model, my_optimizer, my_training_dataloader = accelerator.prepare(
+     my_model, my_optimizer, my_training_dataloader
+ )

  for batch in my_training_dataloader:
      my_optimizer.zero_grad()
      inputs, targets = batch
      inputs = inputs.to(device)
      targets = targets.to(device)
      outputs = my_model(inputs)
      loss = my_loss_function(outputs, targets)
      # Just a small change for the backward instruction
-     loss.backward()
+     accelerator.backward(loss)
      my_optimizer.step()

and with this, your script can now run in a distributed environment (multi-GPU, TPU).

You can even simplify your script a bit by letting πŸ€— Accelerate handle the device placement for you (which is safer, especially for TPU training):

+ from accelerate import Accelerator

+ accelerator = Accelerator()
- my_model.to(device)
  # Pass every important object (model, optimizer, dataloader) to *accelerator.prepare*
+ my_model, my_optimizer, my_training_dataloader = accelerate.prepare(
+     my_model, my_optimizer, my_training_dataloader
+ )

  for batch in my_training_dataloader:
      my_optimizer.zero_grad()
      inputs, targets = batch
-     inputs = inputs.to(device)
-     targets = targets.to(device)
      outputs = my_model(inputs)
      loss = my_loss_function(outputs, targets)
      # Just a small change for the backward instruction
-     loss.backward()
+     accelerator.backward(loss)
      my_optimizer.step()

Script launcher

No need to remember how to use torch.distributed.launch or to write a specific launcher for TPU training! πŸ€— Accelerate comes with a CLI tool that will make your life easier when launching distributed scripts.

On your machine(s) just run:

accelerate config

and answer the questions asked. This will generate a config file that will be used automatically to properly set the default options when doing

accelerate launch my_script.py --args_to_my_script

For instance, here is how you would run the NLP example (from the root of the repo):

accelerate launch examples/nlp_example.py

Supported integrations

  • CPU only
  • single GPU
  • multi-GPU on one node (machine)
  • multi-GPU on several nodes (machines)
  • TPU
  • FP16 with native AMP (apex on the roadmap)
  • DeepSpeed (experimental support)