Experiment trackers
There are a large number of experiment tracking API’s available, however getting them all to work with in a multi-processing environment can oftentimes be complex.
Accelerate provides a general tracking API that can be used to log useful items during your script through Accelerator.log()
Integrated Trackers
Currently Accelerate
supports seven trackers out-of-the-box:
- TensorBoard
- WandB
- CometML
- Aim
- MLFlow
- ClearML
- DVCLive
To use any of them, pass in the selected type(s) to the log_with
parameter in Accelerate
:
from accelerate import Accelerator
from accelerate.utils import LoggerType
accelerator = Accelerator(log_with="all") # For all available trackers in the environment
accelerator = Accelerator(log_with="wandb")
accelerator = Accelerator(log_with=["wandb", LoggerType.TENSORBOARD])
At the start of your experiment Accelerator.init_trackers()
should be used to setup your project, and potentially add any experiment hyperparameters to be logged:
hps = {"num_iterations": 5, "learning_rate": 1e-2}
accelerator.init_trackers("my_project", config=hps)
When you are ready to log any data, Accelerator.log()
should be used.
A step
can also be passed in to correlate the data with a particular step in the training loop.
accelerator.log({"train_loss": 1.12, "valid_loss": 0.8}, step=1)
Once you’ve finished training, make sure to run Accelerator.end_training() so that all the trackers can run their finish functionalities if they have any.
accelerator.end_training()
A full example is below:
from accelerate import Accelerator
accelerator = Accelerator(log_with="all")
config = {
"num_iterations": 5,
"learning_rate": 1e-2,
"loss_function": str(my_loss_function),
}
accelerator.init_trackers("example_project", config=config)
my_model, my_optimizer, my_training_dataloader = accelerator.prepare(my_model, my_optimizer, my_training_dataloader)
device = accelerator.device
my_model.to(device)
for iteration in config["num_iterations"]:
for step, batch in enumerate(my_training_dataloader):
my_optimizer.zero_grad()
inputs, targets = batch
inputs = inputs.to(device)
targets = targets.to(device)
outputs = my_model(inputs)
loss = my_loss_function(outputs, targets)
accelerator.backward(loss)
my_optimizer.step()
accelerator.log({"training_loss": loss}, step=step)
accelerator.end_training()
If a tracker requires a directory to save data to, such as TensorBoard
, then pass the directory path to project_dir
. The project_dir
parameter is useful
when there are other configurations to be combined with in the ProjectConfiguration data class. For example, you can save the TensorBoard data to project_dir
and everything else can be logged in the logging_dir
parameter of [~utils.ProjectConfiguration
:
accelerator = Accelerator(log_with="tensorboard", project_dir=".")
# use with ProjectConfiguration
config = ProjectConfiguration(project_dir=".", logging_dir="another/directory")
accelerator = Accelerator(log_with="tensorboard", project_config=config)
Implementing Custom Trackers
To implement a new tracker to be used in Accelerator
, a new one can be made through implementing the GeneralTracker
class.
Every tracker must implement three functions and have three properties:
__init__
:- Should store a
run_name
and initialize the tracker API of the integrated library. - If a tracker stores their data locally (such as TensorBoard), a
logging_dir
parameter can be added.
- Should store a
store_init_configuration
:- Should take in a
values
dictionary and store them as a one-time experiment configuration
- Should take in a
log
:- Should take in a
values
dictionary and astep
, and should log them to the run
- Should take in a
name
(str
):- A unique string name for the tracker, such as
"wandb"
for the wandb tracker. - This will be used for interacting with this tracker specifically
- A unique string name for the tracker, such as
requires_logging_directory
(bool
):- Whether a
logging_dir
is needed for this particular tracker and if it uses one.
- Whether a
tracker
:- This should be implemented as a
@property
function - Should return the internal tracking mechanism the library uses, such as the
run
object forwandb
.
- This should be implemented as a
Each method should also utilize the state.PartialState class if the logger should only be executed on the main process for instance.
A brief example can be seen below with an integration with Weights and Biases, containing only the relevant information and logging just on the main process:
from accelerate.tracking import GeneralTracker, on_main_process
from typing import Optional
import wandb
class MyCustomTracker(GeneralTracker):
name = "wandb"
requires_logging_directory = False
@on_main_process
def __init__(self, run_name: str):
self.run_name = run_name
run = wandb.init(self.run_name)
@property
def tracker(self):
return self.run.run
@on_main_process
def store_init_configuration(self, values: dict):
wandb.config(values)
@on_main_process
def log(self, values: dict, step: Optional[int] = None):
wandb.log(values, step=step)
When you are ready to build your Accelerator
object, pass in an instance of your tracker to Accelerator.log_with
to have it automatically
be used with the API:
tracker = MyCustomTracker("some_run_name")
accelerator = Accelerator(log_with=tracker)
These also can be mixed with existing trackers, including with "all"
:
tracker = MyCustomTracker("some_run_name")
accelerator = Accelerator(log_with=[tracker, "all"])
Accessing the internal tracker
If some custom interactions with a tracker might be wanted directly, you can quickly access one using the
Accelerator.get_tracker() method. Just pass in the string corresponding to a tracker’s .name
attribute
and it will return that tracker on the main process.
This example shows doing so with wandb:
wandb_tracker = accelerator.get_tracker("wandb")
From there you can interact with wandb
’s run
object like normal:
wandb_run.log_artifact(some_artifact_to_log)
If you want to truly remove Accelerate’s wrapping entirely, you can achieve the same outcome with:
wandb_tracker = accelerator.get_tracker("wandb", unwrap=True)
if accelerator.is_main_process:
wandb_tracker.log_artifact(some_artifact_to_log)
When a wrapper cannot work
If a library has an API that does not follow a strict .log
with an overall dictionary such as Neptune.AI, logging can be done manually under an if accelerator.is_main_process
statement:
from accelerate import Accelerator
+ import neptune
accelerator = Accelerator()
+ run = neptune.init_run(...)
my_model, my_optimizer, my_training_dataloader = accelerate.prepare(my_model, my_optimizer, my_training_dataloader)
device = accelerator.device
my_model.to(device)
for iteration in config["num_iterations"]:
for batch in my_training_dataloader:
my_optimizer.zero_grad()
inputs, targets = batch
inputs = inputs.to(device)
targets = targets.to(device)
outputs = my_model(inputs)
loss = my_loss_function(outputs, targets)
total_loss += loss
accelerator.backward(loss)
my_optimizer.step()
+ if accelerator.is_main_process:
+ run["logs/training/batch/loss"].log(loss)