Migrating your code to 🤗 Accelerate

This tutorial will detail how to easily convert existing PyTorch code to use 🤗 Accelerate! You’ll see that by just changing a few lines of code, 🤗 Accelerate can perform its magic and get you on your way towards running your code on distributed systems with ease!

The base training loop

To begin, write out a very basic PyTorch training loop.

We are under the presumption that training_dataloader, model, optimizer, scheduler, and loss_function have been defined beforehand.

device = "cuda"
model.to(device)

for batch in training_dataloader:
    optimizer.zero_grad()
    inputs, targets = batch
    inputs = inputs.to(device)
    targets = targets.to(device)
    outputs = model(inputs)
    loss = loss_function(outputs, targets)
    loss.backward()
    optimizer.step()
    scheduler.step()

Add in 🤗 Accelerate

To start using 🤗 Accelerate, first import and create an Accelerator instance:

from accelerate import Accelerator

accelerator = Accelerator()

Accelerator is the main force behind utilizing all the possible options for distributed training!

Setting the right device

The Accelerator class knows the right device to move any PyTorch object to at any time, so you should change the definition of device to come from Accelerator:

- device = 'cuda'
+ device = accelerator.device
  model.to(device)

Preparing your objects

Next you need to pass all of the important objects related to training into prepare(). 🤗 Accelerate will make sure everything is setup in the current environment for you to start training:

model, optimizer, training_dataloader, scheduler = accelerator.prepare(
    model, optimizer, training_dataloader, scheduler
)

These objects are returned in the same order they were sent in with. By default when using device_placement=True, all of the objects that can be sent to the right device will be. If you need to work with data that isn’t passed to [~Accelerator.prepare] but should be on the active device, you should pass in the device you made earlier.

Accelerate will only prepare objects that inherit from their respective PyTorch classes (such as torch.optim.Optimizer).

Modifying the training loop

Finally, three lines of code need to be changed in the training loop. 🤗 Accelerate’s DataLoader classes will automatically handle the device placement by default, and backward() should be used for performing the backward pass:

-   inputs = inputs.to(device)
-   targets = targets.to(device)
    outputs = model(inputs)
    loss = loss_function(outputs, targets)
-   loss.backward()
+   accelerator.backward(loss)

With that, your training loop is now ready to use 🤗 Accelerate!

The finished code

Below is the final version of the converted code:

from accelerate import Accelerator

accelerator = Accelerator()

model, optimizer, training_dataloader, scheduler = accelerator.prepare(
    model, optimizer, training_dataloader, scheduler
)

for batch in training_dataloader:
    optimizer.zero_grad()
    inputs, targets = batch
    outputs = model(inputs)
    loss = loss_function(outputs, targets)
    accelerator.backward(loss)
    optimizer.step()
    scheduler.step()