Accelerate documentation

Execution process

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.30.1).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Execution process

When working with distributed training systems, it is important to manage how and when processes are executed across GPUs. Some processes are completed faster than others, and some processes shouldn’t begin if others haven’t finished yet. Accelerate provides tools for orchestrating when processes are executed to ensure everything remains synchronized across all devices.

This tutorial will teach you how to execute a process on only one machine and how to delay execution until all processes have reached a certain point.

Execute on one process

Certain code only needs to be run once on a given machine, such as printing a log statement or only displaying one progress bar on the local main process.


You should use accelerator.is_local_main_process to indicate code that should only be executed once.

from import tqdm

progress_bar = tqdm(range(args.max_train_steps), disable=not accelerator.is_local_main_process)

You could also wrap a statement with accelerator.is_local_main_process.

For standalone print statements that aren’t wrapped in accelerator.is_local_main_process, replace print with Accelerate’s print() method to only print once per process.

if accelerator.is_local_main_process:
    print("Accelerate is the best")

You could also direct Accelerate to execute code once across all processes regardless of the number of machines. This is useful if you’re uploading a final model to the Hub.


You should use accelerator.is_main_process to indicate code that should only be executed once across all processes.

if accelerator.is_main_process:

Execute on a specific process

Accelerate can also help you execute functions that should only be executed on a specific process or a local process index.

specific process
local process

Use the on_process() method and specify the process index to execute a function on.

def do_my_thing():
    "Something done on process index 0"

Defer execution

When you run your script on several GPUs at the same time, some code may be executed faster than others. You might need to wait for all processes to reach a certain point before executing the next set of instructions. For instance, you shouldn’t save a model before making sure every process is done with training.

To do this, add wait_for_everyone() in your code. This blocks all processes that have finished first from continuing until all remaining processes have reached the same point (this has no effect if you’re running on a single GPU or CPU).

< > Update on GitHub