Migrating your code to 🤗 Accelerate
This tutorial will detail how to easily convert existing PyTorch code to use 🤗 Accelerate! You’ll see that by just changing a few lines of code, 🤗 Accelerate can perform its magic and get you on your way toward running your code on distributed systems with ease!
The base training loop
To begin, write out a very basic PyTorch training loop.
We are under the presumption that training_dataloader
, model
, optimizer
, scheduler
, and loss_function
have been defined beforehand.
device = "cuda"
model.to(device)
for batch in training_dataloader:
optimizer.zero_grad()
inputs, targets = batch
inputs = inputs.to(device)
targets = targets.to(device)
outputs = model(inputs)
loss = loss_function(outputs, targets)
loss.backward()
optimizer.step()
scheduler.step()
Add in 🤗 Accelerate
To start using 🤗 Accelerate, first import and create an Accelerator instance:
from accelerate import Accelerator
accelerator = Accelerator()
Accelerator is the main force behind utilizing all the possible options for distributed training!
Setting the right device
The Accelerator class knows the right device to move any PyTorch object to at any time, so you should
change the definition of device
to come from Accelerator:
- device = 'cuda'
+ device = accelerator.device
model.to(device)
Preparing your objects
Next, you need to pass all of the important objects related to training into prepare(). 🤗 Accelerate will make sure everything is setup in the current environment for you to start training:
model, optimizer, training_dataloader, scheduler = accelerator.prepare(
model, optimizer, training_dataloader, scheduler
)
These objects are returned in the same order they were sent in. By default when using device_placement=True
, all of the objects that can be sent to the right device will be.
If you need to work with data that isn’t passed to [~Accelerator.prepare] but should be on the active device, you should pass in the device
you made earlier.
Accelerate will only prepare objects that inherit from their respective PyTorch classes (such as torch.optim.Optimizer
).
Modifying the training loop
Finally, three lines of code need to be changed in the training loop. 🤗 Accelerate’s DataLoader classes will automatically handle the device placement by default, and backward() should be used for performing the backward pass:
- inputs = inputs.to(device)
- targets = targets.to(device)
outputs = model(inputs)
loss = loss_function(outputs, targets)
- loss.backward()
+ accelerator.backward(loss)
With that, your training loop is now ready to use 🤗 Accelerate!
The finished code
Below is the final version of the converted code:
from accelerate import Accelerator
accelerator = Accelerator()
model, optimizer, training_dataloader, scheduler = accelerator.prepare(
model, optimizer, training_dataloader, scheduler
)
for batch in training_dataloader:
optimizer.zero_grad()
inputs, targets = batch
outputs = model(inputs)
loss = loss_function(outputs, targets)
accelerator.backward(loss)
optimizer.step()
scheduler.step()
More Resources
To check out more ways on how to migrate to 🤗 Accelerate, check out our interactive migration tutorial which showcases other items that need to be watched for when using Accelerate and how to do so quickly.