##
from accelerate import Accelerator accelerator = Accelerator( + gradient_accumulation_steps=2, ) dataloader, model, optimizer scheduler = accelerator.prepare( dataloader, model, optimizer, scheduler ) for batch in dataloader: + with accelerator.accumulate(model): optimizer.zero_grad() inputs, targets = batch outputs = model(inputs) loss = loss_function(outputs, targets) accelerator.backward(loss) optimizer.step() scheduler.step()## When performing gradient accumulation in a distributed setup, there are many opportunities for efficiency mistakes to occur. `Accelerator` provides a context manager that will take care of the details for you and ensure that the model is training correctly. Simply wrap the training loop in the `Accelerator.accumulate` context manager while passing in the model you are training on and during training the gradients will accumulate and synchronize automatically when needed. ## To learn more checkout the related documentation: - API reference - Example script - Performing automatic gradient accumulation