Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Tracking

There are a large number of experiment tracking API’s available, however getting them all to work with in a multi-processing environment can oftentimes be complex. 🤗 Accelerate provides a general tracking API that can be used to log useful items during your script through Accelerator.log()

Integrated Trackers

Currently Accelerate supports three trackers out-of-the-box:

  • TensorBoard
  • WandB
  • CometML

To use any of them, pass in the selected type(s) to the log_with parameter in Accelerate:

from accelerate import Accelerator
from accelerate.utils import LoggerType

accelerator = Accelerator(log_with="all")  # For all available trackers in the environment
accelerator = Accelerator(log_with="wandb")
accelerator = Accelerator(log_with=["wandb", LoggerType.TENSORBOARD])

At the start of your experiment Accelerator.init_trackers() should be used to setup your project, and potentially add any experiment hyperparameters to be logged:

hps = {"num_iterations": 5, "learning_rate": 1e-2}
accelerator.init_trackers("my_project", config=hps)

When you are ready to log any data, Accelerator.log() should be used. A step can also be passed in to correlate the data with a particular step in the training loop.

accelerator.log({"train_loss": 1.12, "valid_loss": 0.8}, step=1)

Once you’ve finished training, make sure to run Accelerator.end_training() so that all the trackers can run their finish functionalities if they have any.

accelerator.end_training()

A full example is below:

from accelerate import Accelerator

accelerator = Accelerator(log_with="all")
config = {
    "num_iterations": 5,
    "learning_rate": 1e-2,
    "loss_function": str(my_loss_function),
}

accelerator.init_trackers("example_project", config=config)

my_model, my_optimizer, my_training_dataloader = accelerate.prepare(my_model, my_optimizer, my_training_dataloader)
device = accelerator.device
my_model.to(device)

for iteration in config["num_iterations"]:
    for step, batch in my_training_dataloader:
        my_optimizer.zero_grad()
        inputs, targets = batch
        inputs = inputs.to(device)
        targets = targets.to(device)
        outputs = my_model(inputs)
        loss = my_loss_function(outputs, targets)
        accelerator.backward(loss)
        my_optimizer.step()
        accelerator.log({"training_loss": loss}, step=step)
accelerator.end_training()

Implementing Custom Trackers

To implement a new tracker to be used in Accelerator, a new one can be made through implementing the GeneralTracker class. Every tracker must implement three functions and have three properties:

  • __init__:

    • Should store a run_name and initialize the tracker API of the integrated library.
    • If a tracker stores their data locally (such as TensorBoard), a logging_dir parameter can be added.
  • store_init_configuration:

    • Should take in a values dictionary and store them as a one-time experiment configuration
  • log:

    • Should take in a values dictionary and a step, and should log them to the run
  • name (str):

    • A unique string name for the tracker, such as "wandb" for the wandb tracker.
    • This will be used for interacting with this tracker specifically
  • requires_logging_directory (bool):

    • Whether a logging_dir is needed for this particular tracker and if it uses one.
  • tracker:

    • This should be implemented as a @property function
    • Should return the internal tracking mechanism the library uses, such as the run object for wandb.

A brief example can be seen below with an integration with Weights and Biases, containing only the relevant information:

from accelerate.tracking import GeneralTracker
from typing import Optional

import wandb


class MyCustomTracker(GeneralTracker):
    name = "wandb"
    requires_logging_directory = False

    def __init__(self, run_name: str):
        self.run_name = run_name
        run = wandb.init(self.run_name)

    @property
    def tracker(self):
        return self.run.run

    def store_init_configuration(self, values: dict):
        wandb.config(values)

    def log(self, values: dict, step: Optional[int] = None):
        wandb.log(values, step=step)

When you are ready to build your Accelerator object, pass in an instance of your tracker to Accelerator.log_with to have it automatically be used with the API:

tracker = MyCustomTracker("some_run_name")
accelerator = Accelerator(log_with=tracker)

These also can be mixed with existing trackers, including with "all":

tracker = MyCustomTracker("some_run_name")
accelerator = Accelerator(log_with=[tracker, "all"])

Accessing the internal tracker

If some custom interactions with a tracker might be wanted directly, you can quickly access one using the Accelerator.get_tracker() method. Just pass in the string corresponding to a tracker’s .name attribute and it will return that tracker on the main process.

This example shows doing so with wandb:

wandb_tracker = accelerator.get_tracker("wandb")

From there you can interact with wandb’s run object like normal:

Make sure to only interact with trackers on the main process!
if accelerator.is_main_process:
    wandb_run.log_artifact(some_artifact_to_log)

When a wrapper cannot work

If a library has an API that does not follow a strict .log with an overall dictionary such as Neptune.AI, logging can be done manually under an if accelerator.is_main_process statement:

  from accelerate import Accelerator
+ import neptune.new as neptune

  accelerator = Accelerator()
+ run = neptune.init(...)

  my_model, my_optimizer, my_training_dataloader = accelerate.prepare(my_model, my_optimizer, my_training_dataloader)
  device = accelerator.device
  my_model.to(device)

  for iteration in config["num_iterations"]:
      for batch in my_training_dataloader:
          my_optimizer.zero_grad()
          inputs, targets = batch
          inputs = inputs.to(device)
          targets = targets.to(device)
          outputs = my_model(inputs)
          loss = my_loss_function(outputs, targets)
          total_loss += loss
          accelerator.backward(loss)
          my_optimizer.step()
+         if accelerator.is_main_process:
+             run["logs/training/batch/loss"].log(loss)