Helpful Utilities

Below are a variety of utility functions that 🤗 Accelerate provides, broken down by use-case.

Data Classes

These are basic dataclasses used throughout 🤗 Accelerate and they can be passed in as parameters.

class accelerate.DistributedType

< source >

( value names = None module = None qualname = None type = None start = 1 )

Represents a type of distributed environment.

Values:

NO — Not a distributed environment, just a single process.
MULTI_CPU — Distributed on multiple CPU nodes.
MULTI_GPU — Distributed on multiple GPUs.
MULTI_NPU — Distributed on multiple NPUs.
MULTI_XPU — Distributed on multiple XPUs.
DEEPSPEED — Using DeepSpeed.
TPU — Distributed on TPUs.

class accelerate.utils.LoggerType

< source >

( value names = None module = None qualname = None type = None start = 1 )

Represents a type of supported experiment tracker

Values:

ALL — all available trackers in the environment that are supported
TENSORBOARD — TensorBoard as an experiment tracker
WANDB — wandb as an experiment tracker
COMETML — comet_ml as an experiment tracker

class accelerate.utils.PrecisionType

< source >

( value names = None module = None qualname = None type = None start = 1 )

Represents a type of precision used on floating point values

Values:

NO — using full precision (FP32)
FP16 — using half precision
BF16 — using brain floating point precision

class accelerate.utils.ProjectConfiguration

< source >

( project_dir: str = None logging_dir: str = None automatic_checkpoint_naming: bool = False total_limit: int = None iteration: int = 0 )

Configuration for the Accelerator object based on inner-project needs.

set_directories

< source >

( project_dir: str = None )

Sets self.project_dir and self.logging_dir to the appropriate values.

Data Manipulation and Operations

These include data operations that mimic the same torch ops but can be used on distributed processes.

accelerate.utils.broadcast

< source >

( tensor from_process: int = 0 )

Parameters

tensor (nested list/tuple/dictionary of torch.Tensor) — The data to gather.
from_process (int, optional, defaults to 0) — The process from which to send the data

Recursively broadcast tensor in a nested list/tuple/dictionary of tensors to all devices.

accelerate.utils.concatenate

< source >

( data dim = 0 )

Parameters

data (nested list/tuple/dictionary of lists of tensors torch.Tensor) — The data to concatenate.
dim (int, optional, defaults to 0) — The dimension on which to concatenate.

Recursively concatenate the tensors in a nested list/tuple/dictionary of lists of tensors with the same shape.

accelerate.utils.gather

< source >

( tensor )

Parameters

tensor (nested list/tuple/dictionary of torch.Tensor) — The data to gather.

Recursively gather tensor in a nested list/tuple/dictionary of tensors from all devices.

accelerate.utils.pad_across_processes

< source >

( tensor dim = 0 pad_index = 0 pad_first = False )

Parameters

tensor (nested list/tuple/dictionary of torch.Tensor) — The data to gather.
dim (int, optional, defaults to 0) — The dimension on which to pad.
pad_index (int, optional, defaults to 0) — The value with which to pad.
pad_first (bool, optional, defaults to False) — Whether to pad at the beginning or the end.

Recursively pad the tensors in a nested list/tuple/dictionary of tensors from all devices to the same size so they can safely be gathered.

accelerate.utils.reduce

< source >

( tensor reduction = 'mean' )

Parameters

tensor (nested list/tuple/dictionary of torch.Tensor) — The data to reduce.
reduction (str, optional, defaults to "mean") — A reduction method. Can be of “mean”, “sum”, or “none”

Recursively reduce the tensors in a nested list/tuple/dictionary of lists of tensors across all processes by the mean of a given operation.

accelerate.utils.send_to_device

< source >

( tensor device non_blocking = False skip_keys = None )

Parameters

tensor (nested list/tuple/dictionary of torch.Tensor) — The data to send to a given device.
device (torch.device) — The device to send the data to.

Recursively sends the elements in a nested list/tuple/dictionary of tensors to a given device.

Environment Checks

These functionalities check the state of the current working environment including information about the operating system itself, what it can support, and if particular dependencies are installed.

accelerate.utils.is_bf16_available

< source >

( ignore_tpu = False )

Checks if bf16 is supported, optionally ignoring the TPU

accelerate.utils.is_torch_version

< source >

( operation: str version: str )

Parameters

operation (str) — A string representation of an operator, such as ">" or "<="
version (str) — A string version of PyTorch

Compares the current PyTorch version to a given reference with an operation.

accelerate.utils.is_tpu_available

( check_device = True )

Checks if torch_xla is installed and potentially if a TPU is in the environment

Environment Configuration

accelerate.commands.config.default.write_basic_config

< source >

( mixed_precision = 'no' save_location: str = '/github/home/.cache/huggingface/accelerate/default_config.yaml' use_xpu: bool = False )

Parameters

mixed_precision (str, optional, defaults to “no”) — Mixed Precision to use. Should be one of “no”, “fp16”, or “bf16”
save_location (str, optional, defaults to default_json_config_file) — Optional custom save location. Should be passed to --config_file when using accelerate launch. Default location is inside the huggingface cache folder (~/.cache/huggingface) but can be overriden by setting the HF_HOME environmental variable, followed by accelerate/default_config.yaml.
use_xpu (bool, optional, defaults to False) — Whether to use XPU if available.

Creates and saves a basic cluster config to be used on a local machine with potentially multiple GPUs. Will also set CPU if it is a CPU-only machine.

When setting up 🤗 Accelerate for the first time, rather than running accelerate config [~utils.write_basic_config] can be used as an alternative for quick configuration.

Memory

accelerate.utils.get_max_memory

< source >

( max_memory: typing.Union[typing.Dict[typing.Union[int, str], typing.Union[int, str]], NoneType] = None )

Get the maximum memory available if nothing is passed, converts string to int otherwise.

accelerate.find_executable_batch_size

< source >

( function: callable = None starting_batch_size: int = 128 )

Parameters

function (callable, optional) — A function to wrap
starting_batch_size (int, optional) — The batch size to try and fit into memory

A basic decorator that will try to execute function. If it fails from exceptions related to out-of-memory or CUDNN, the batch size is cut in half and passed to function

function must take in a batch_size parameter as its first argument.

Example:

>>> from accelerate.utils import find_executable_batch_size


>>> @find_executable_batch_size(starting_batch_size=128)
... def train(batch_size, model, optimizer):
...     ...


>>> train(model, optimizer)

Modeling

These utilities relate to interacting with PyTorch models

accelerate.utils.extract_model_from_parallel

< source >

( model keep_fp32_wrapper: bool = True ) → torch.nn.Module

Parameters

model (torch.nn.Module) — The model to extract.
keep_fp32_wrapper (bool, optional) — Whether to remove mixed precision hooks from the model.

Returns

torch.nn.Module

The extracted model.

Extract a model from its distributed containers.

accelerate.utils.get_max_layer_size

< source >

( modules: typing.List[typing.Tuple[str, torch.nn.modules.module.Module]] module_sizes: typing.Dict[str, int] no_split_module_classes: typing.List[str] ) → Tuple[int, List[str]]

Parameters

modules (List[Tuple[str, torch.nn.Module]]) — The list of named modules where we want to determine the maximum layer size.
module_sizes (Dict[str, int]) — A dictionary mapping each layer name to its size (as generated by compute_module_sizes).
no_split_module_classes (List[str]) — A list of class names for layers we don’t want to be split.

Returns

Tuple[int, List[str]]

The maximum size of a layer with the list of layer names realizing that maximum size.

Utility function that will scan a list of named modules and return the maximum size used by one full layer. The definition of a layer being:

a module with no direct children (just parameters and buffers)
a module whose class name is in the list no_split_module_classes

accelerate.utils.offload_state_dict

< source >

( save_dir: typing.Union[str, os.PathLike] state_dict: typing.Dict[str, torch.Tensor] )

Parameters

save_dir (str or os.PathLike) — The directory in which to offload the state dict.
state_dict (Dict[str, torch.Tensor]) — The dictionary of tensors to offload.

Offload a state dict in a given folder.

Parallel

These include general utilities that should be used when working in parallel.

accelerate.utils.extract_model_from_parallel

< source >

( model keep_fp32_wrapper: bool = True ) → torch.nn.Module

Parameters

model (torch.nn.Module) — The model to extract.
keep_fp32_wrapper (bool, optional) — Whether to remove mixed precision hooks from the model.

Returns

torch.nn.Module

The extracted model.

Extract a model from its distributed containers.

accelerate.utils.save

< source >

( obj f )

Save the data to disk. Use in place of torch.save().

accelerate.utils.wait_for_everyone

< source >

( )

Introduces a blocking point in the script, making sure all processes have reached this point before continuing.

Make sure all processes will reach this instruction otherwise one of your processes will hang forever.

Random

These utilities relate to setting and synchronizing of all the random states.

accelerate.utils.set_seed

< source >

( seed: int device_specific: bool = False )

Parameters

seed (int) — The seed to set.
device_specific (bool, optional, defaults to False) — Whether to differ the seed on each device slightly with self.process_index.

Helper function for reproducible behavior to set the seed in random, numpy, torch.

accelerate.utils.synchronize_rng_state

< source >

( rng_type: typing.Optional[accelerate.utils.dataclasses.RNGType] = None generator: typing.Optional[torch._C.Generator] = None )

accelerate.synchronize_rng_states

< source >

( rng_types: typing.List[typing.Union[str, accelerate.utils.dataclasses.RNGType]] generator: typing.Optional[torch._C.Generator] = None )

PyTorch XLA

These include utilities that are useful while using PyTorch with XLA.

accelerate.utils.install_xla

< source >

( upgrade: bool = False )

Parameters

upgrade (bool, optional, defaults to False) — Whether to upgrade torch and install the latest torch_xla wheels.

Helper function to install appropriate xla wheels based on the torch version in Google Colaboratory.

Example:

>>> from accelerate.utils import install_xla

>>> install_xla(upgrade=True)

Loading model weights

These include utilities that are useful to load checkpoints.

accelerate.load_checkpoint_in_model

< source >

( model: Module checkpoint: typing.Union[str, os.PathLike] device_map: typing.Union[typing.Dict[str, typing.Union[int, str, torch.device]], NoneType] = None offload_folder: typing.Union[str, os.PathLike, NoneType] = None dtype: typing.Union[str, torch.dtype, NoneType] = None offload_state_dict: bool = False offload_buffers: bool = False keep_in_fp32_modules: typing.List[str] = None offload_8bit_bnb: bool = False )

Parameters

model (torch.nn.Module) — The model in which we want to load a checkpoint.
checkpoint (str or os.PathLike) — The folder checkpoint to load. It can be:
- a path to a file containing a whole model state dict
- a path to a .json file containing the index to a sharded checkpoint
- a path to a folder containing a unique .index.json file and the shards of a checkpoint.
- a path to a folder containing a unique pytorch_model.bin file.
device_map (Dict[str, Union[int, str, torch.device]], optional) — A map that specifies where each submodule should go. It doesn’t need to be refined to each parameter/buffer name, once a given module name is inside, every submodule of it will be sent to the same device.
offload_folder (str or os.PathLike, optional) — If the device_map contains any value "disk", the folder where we will offload weights.
dtype (str or torch.dtype, optional) — If provided, the weights will be converted to that type when loaded.
offload_state_dict (bool, optional, defaults to False) — If True, will temporarily offload the CPU state dict on the hard drive to avoid getting out of CPU RAM if the weight of the CPU state dict + the biggest shard does not fit.
offload_buffers (bool, optional, defaults to False) — Whether or not to include the buffers in the weights offloaded to disk.
keep_in_fp32_modules(List[str], optional) — A list of the modules that we keep in torch.float32 dtype.
offload_8bit_bnb (bool, optional) — Whether or not to enable offload of 8-bit modules on cpu/disk.

Loads a (potentially sharded) checkpoint inside a model, potentially sending weights to a given device as they are loaded.

Once loaded across devices, you still need to call dispatch_model() on your model to make it able to run. To group the checkpoint loading and dispatch in one single call, use load_checkpoint_and_dispatch().

Quantization

These include utilities that are useful to quantize model.

accelerate.utils.load_and_quantize_model

< source >

( model: Module bnb_quantization_config: BnbQuantizationConfig weights_location: typing.Union[str, os.PathLike] = None device_map: typing.Union[typing.Dict[str, typing.Union[int, str, torch.device]], NoneType] = None no_split_module_classes: typing.Optional[typing.List[str]] = None max_memory: typing.Union[typing.Dict[typing.Union[int, str], typing.Union[int, str]], NoneType] = None offload_folder: typing.Union[str, os.PathLike, NoneType] = None offload_state_dict: bool = False ) → torch.nn.Module

Parameters

model (torch.nn.Module) — Input model. The model can be already loaded or on the meta device
bnb_config (BnbQuantizationConfig) — The bitsandbytes quantization parameters
weights_location (str or os.PathLike) — The folder weights_location to load. It can be:
- a path to a file containing a whole model state dict
- a path to a .json file containing the index to a sharded checkpoint
- a path to a folder containing a unique .index.json file and the shards of a checkpoint.
- a path to a folder containing a unique pytorch_model.bin file.
device_map (Dict[str, Union[int, str, torch.device]], optional) — A map that specifies where each submodule should go. It doesn’t need to be refined to each parameter/buffer name, once a given module name is inside, every submodule of it will be sent to the same device.
no_split_module_classes (List[str], optional) — A list of layer class names that should never be split across device (for instance any layer that has a residual connection).
max_memory (Dict, optional) — A dictionary device identifier to maximum memory. Will default to the maximum memory available if unset.
offload_folder (str or os.PathLike, optional) — If the device_map contains any value "disk", the folder where we will offload weights.
offload_state_dict (bool, optional, defaults to False) — If True, will temporarily offload the CPU state dict on the hard drive to avoid getting out of CPU RAM if the weight of the CPU state dict + the biggest shard does not fit.

Returns

torch.nn.Module

The quantized model

This function will quantize the input model with the associated config passed in bnb_quantization_config. If the model is in the meta device, we will load and dispatch the weights according to the device_map passed. If the model is already loaded, we will quantize the model and put the model on the GPU,

class accelerate.utils.BnbQuantizationConfig

< source >

( load_in_8bit: bool = False llm_int8_threshold: float = 6.0 load_in_4bit: bool = False bnb_4bit_quant_type: str = 'fp4' bnb_4bit_use_double_quant: bool = False bnb_4bit_compute_dtype: bool = 'fp16' torch_dtype: dtype = None skip_modules: typing.List[str] = None keep_in_fp32_modules: typing.List[str] = None )

A plugin to enable BitsAndBytes 4bit and 8bit quantization