Accelerate documentation

Tracking

You are viewing v0.10.0 version. A newer version v0.29.3 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Tracking

There are a large number of experiment tracking API’s available, however getting them all to work with in a multi-processing environment can oftentimes be complex. Accelerate provides a general tracking API that can be used to log useful items during your script through log()

Integrated Trackers

Currently Accelerate supports three trackers out-of-the-box:

class accelerate.tracking.TensorBoardTracker

< >

( run_name: str logging_dir: typing.Union[str, os.PathLike, NoneType] )

Parameters

  • run_name (str) — The name of the experiment run
  • logging_dir (str, os.PathLike) — Location for TensorBoard logs to be stored.

A Tracker class that supports tensorboard. Should be initialized at the start of your script.

finish

< >

( )

Closes TensorBoard writer

log

< >

( values: dict step: typing.Optional[int] = None )

Parameters

  • values (Dictionary str to str, float, int or dict of str to float/int) — Values to be logged as key-value pairs. The values need to have type str, float, int or dict of str to float/int.
  • step (int, optional) — The run step. If included, the log will be affiliated with this step.

Logs values to the current run.

store_init_configuration

< >

( values: dict )

Parameters

  • values (Dictionary str to bool, str, float or int) — Values to be stored as initial hyperparameters as key-value pairs. The values need to have type bool, str, float, int, or None.

Logs values as hyperparameters for the run. Should be run at the beginning of your experiment.

class accelerate.tracking.WandBTracker

< >

( run_name: str )

Parameters

  • run_name (str) — The name of the experiment run.

A Tracker class that supports wandb. Should be initialized at the start of your script.

finish

< >

( )

Closes wandb writer

log

< >

( values: dict step: typing.Optional[int] = None )

Parameters

  • values (Dictionary str to str, float, int or dict of str to float/int) — Values to be logged as key-value pairs. The values need to have type str, float, int or dict of str to float/int.
  • step (int, optional) — The run step. If included, the log will be affiliated with this step.

Logs values to the current run.

store_init_configuration

< >

( values: dict )

Parameters

  • values (Dictionary str to bool, str, float or int) — Values to be stored as initial hyperparameters as key-value pairs. The values need to have type bool, str, float, int, or None.

Logs values as hyperparameters for the run. Should be run at the beginning of your experiment.

class accelerate.tracking.CometMLTracker

< >

( run_name: str )

Parameters

  • run_name (str) — The name of the experiment run.

A Tracker class that supports comet_ml. Should be initialized at the start of your script.

API keys must be stored in a Comet config file.

finish

< >

( )

Closes comet-ml writer

log

< >

( values: dict step: typing.Optional[int] = None )

Parameters

  • values (Dictionary str to str, float, int or dict of str to float/int) — Values to be logged as key-value pairs. The values need to have type str, float, int or dict of str to float/int.
  • step (int, optional) — The run step. If included, the log will be affiliated with this step.

Logs values to the current run.

store_init_configuration

< >

( values: dict )

Parameters

  • values (Dictionary str to bool, str, float or int) — Values to be stored as initial hyperparameters as key-value pairs. The values need to have type bool, str, float, int, or None.

Logs values as hyperparameters for the run. Should be run at the beginning of your experiment.

To use any of them, pass in the selected type(s) to the log_with parameter in Accelerate:

from accelerate import Accelerator
from accelerate.utils import LoggerType

accelerator = Accelerator(log_with="all")  # For all available trackers in the environment
accelerator = Accelerator(log_with="wandb")
accelerator = Accelerator(log_with=["wandb", LoggerType.TENSORBOARD])

At the start of your experiment init_trackers() should be used to setup your project, and potentially add any experiment hyperparameters to be logged:

hps = {"num_iterations": 5, "learning_rate": 1e-2}
accelerator.init_trackers("my_project", config=hps)

When you are ready to log any data, log() should be used. A step can also be passed in to correlate the data with a particular step in the training loop.

accelerator.log({"train_loss": 1.12, "valid_loss": 0.8}, step=1)

Once you’ve finished training, make sure to run end_training() so that all the trackers can run their finish functionalities if they have any.

accelerator.end_training()

A full example is below:

from accelerate import Accelerator

accelerator = Accelerator(log_with="all")
config = {
    "num_iterations": 5,
    "learning_rate": 1e-2,
    "loss_function": str(my_loss_function),
}

accelerator.init_trackers("example_project", config=config)

my_model, my_optimizer, my_training_dataloader = accelerate.prepare(my_model, my_optimizer, my_training_dataloader)
device = accelerator.device
my_model.to(device)

for iteration in config["num_iterations"]:
    for step, batch in my_training_dataloader:
        my_optimizer.zero_grad()
        inputs, targets = batch
        inputs = inputs.to(device)
        targets = targets.to(device)
        outputs = my_model(inputs)
        loss = my_loss_function(outputs, targets)
        accelerator.backward(loss)
        my_optimizer.step()
        accelerator.log({"training_loss": loss}, step=step)
accelerator.end_training()

Implementing Custom Trackers

To implement a new tracker to be used in Accelerator, a new one can be made through implementing the ~GeneralTracker class. Every tracker must implement three functions:

  • __init__:
    • Should store a run_name and initialize the tracker API of the integrated library.
    • If a tracker stores their data locally (such as TensorBoard), a logging_dir parameter can be added.
  • store_init_configuration:
    • Should take in a values dictionary and store them as a one-time experiment configuration
  • log:
    • Should take in a values dictionary and a step, and should log them to the run

A brief example can be seen below with an integration with Weights and Biases, containing only the relevent information:

from accelerate.tracking import GeneralTracker
from typing import Optional

import wandb


class MyCustomTracker(GeneralTracker):
    def __init__(self, run_name: str):
        self.run_name = run_name
        wandb.init(self.run_name)

    def store_init_configuration(self, values: dict):
        wandb.config(values)

    def log(self, values: dict, step: Optional[int] = None):
        wandb.log(values, step=step)

When you are ready to build your Accelerator object, pass in an instance of your tracker to log_with to have it automatically be used with the API:

tracker = MyCustomTracker("some_run_name")
accelerator = Accelerator(log_with=tracker)

These also can be mixed with existing trackers, including with "all":

tracker = MyCustomTracker("some_run_name")
accelerator = Accelerator(log_with=[tracker, "all"])

When a wrapper cannot work

If a library has an API that does not follow a strict .log with an overall dictionary such as Neptune.AI, logging can be done manually under an if accelerator.is_main_process statement:

from accelerate import Accelerator
+ import neptune.new as neptune

accelerator = Accelerator()
+ run = neptune.init(...)

my_model, my_optimizer, my_training_dataloader = accelerate.prepare(my_model, my_optimizer, my_training_dataloader)
device = accelerator.device
my_model.to(device)

for iteration in config["num_iterations"]:
    for batch in my_training_dataloader:
        my_optimizer.zero_grad()
        inputs, targets = batch
        inputs = inputs.to(device)
        targets = targets.to(device)
        outputs = my_model(inputs)
        loss = my_loss_function(outputs, targets)
        total_loss += loss
        accelerator.backward(loss)
        my_optimizer.step()
+       if accelerator.is_main_process:
+           run["logs/training/batch/loss"].log(loss)