Execution process

When working with distributed training systems, it is important to manage how and when processes are executed across GPUs. Some processes are completed faster than others, and some processes shouldn’t begin if others haven’t finished yet. Accelerate provides tools for orchestrating when processes are executed to ensure everything remains synchronized across all devices.

This tutorial will teach you how to execute a process on only one machine and how to delay execution until all processes have reached a certain point.

Execute on one process

Certain code only needs to be run once on a given machine, such as printing a log statement or only displaying one progress bar on the local main process.

statements

function

You could also direct Accelerate to execute code once across all processes regardless of the number of machines. This is useful if you’re uploading a final model to the Hub.

statement

function

Execute on a specific process

Accelerate can also help you execute functions that should only be executed on a specific process or a local process index.

specific process

local process

Defer execution

When you run your script on several GPUs at the same time, some code may be executed faster than others. You might need to wait for all processes to reach a certain point before executing the next set of instructions. For instance, you shouldn’t save a model before making sure every process is done with training.

To do this, add wait_for_everyone() in your code. This blocks all processes that have finished first from continuing until all remaining processes have reached the same point (this has no effect if you’re running on a single GPU or CPU).

accelerator.wait_for_everyone()