Internals¶
Optimizer¶
-
class
accelerate.optimizer.
AcceleratedOptimizer
(optimizer, device_placement=True, scaler=None)[source]¶ Internal wrapper around a torch optimizer.
- Parameters
optimizer (
torch.optim.optimizer.Optimizer
) – The optimizer to wrap.device_placement (
bool
, optional, defaults toTrue
) – Whether or not the optimizer should handle device placement. If so, it will place the state dictionary ofoptimizer
on the right device.scaler (
torch.cuda.amp.grad_scaler.GradScaler
, optional) – The scaler to use in the step function if training with mixed precision.
DataLoader¶
The main work on your PyTorch DataLoader
is done by the following function:
-
accelerate.data_loader.
prepare_data_loader
(dataloader: torch.utils.data.dataloader.DataLoader, device: Optional[torch.device] = None, num_processes: Optional[int] = None, process_index: Optional[int] = None, split_batches: bool = False, put_on_device: bool = False, rng_types: Optional[List[Union[str, accelerate.utils.RNGType]]] = None) → torch.utils.data.dataloader.DataLoader[source]¶ Wraps a PyTorch
DataLoader
to generate batches for one of the processes only.Depending on the value of the
drop_last
attribute of thedataloader
passed, it will either stop the iteration at the first batch that would be too small / not present on all processes or loop with indices from the beginning.- Parameters
dataloader (
torch.utils.data.dataloader.DataLoader
) – The data loader to split across several devices.device (
torch.device
) – The target device for the returnedDataLoader
.num_processes (
int
, optional) – The number of processes running concurrently. Will default to the value given byAcceleratorState
.process_index (
int
, optional) – The index of the current process. Will default to the value given byAcceleratorState
.split_batches (
bool
, optional, defaults toFalse
) –Whether the resulting
DataLoader
should split the batches of the original data loader across devices or yield full batches (in which case it will yield batches starting at theprocess_index
-th and advancing ofnum_processes
batches at each iteration).Another way to see this is that the observed batch size will be the same as the initial
dataloader
if this option is set toTrue
, the batch size of the initialdataloader
multiplied bynum_processes
otherwise.Setting this option to
True
requires that the batch size of thedataloader
is a round multiple ofbatch_size
.put_on_device (
bool
, optional, defaults toFalse
) – Whether or not to put the batches ondevice
(only works if the batches are nested list, tuples or dictionaries of tensors).rng_types (list of
str
orRNGType
) –The list of random number generators to synchronize at the beginning of each iteration. Should be one or several of:
"torch"
: the base torch random number generator"cuda"
: the CUDA random number generator (GPU only)"xla"
: the XLA random number generator (TPU only)"generator"
: thetorch.Generator
of the sampler (or batch sampler if there is no sampler in your dataloader) or of the iterable dataset (if it exists) if the underlying dataset is of that type.
- Returns
A new data loader that will yield the portion of the batches
- Return type
torch.utils.data.dataloader.DataLoader
Warning
This does not support
BatchSampler
with varying batch size yet.
BatchSamplerShard¶
-
class
accelerate.data_loader.
DataLoaderShard
(*args, **kwds)[source]¶ Subclass of a PyTorch
DataLoader
that will deal with device placement and current distributed setup.- Parameters
dataset (
torch.utils.data.dataset.Dataset
) – The dataset to use to build this datalaoder.device (
torch.device
, optional) – If passed, the device to put all batches on.rng_types (list of
str
orRNGType
) –The list of random number generators to synchronize at the beginning of each iteration. Should be one or several of:
"torch"
: the base torch random number generator"cuda"
: the CUDA random number generator (GPU only)"xla"
: the XLA random number generator (TPU only)"generator"
: an optionaltorch.Generator
generator (
torch.Generator
, optional) – A random number generator to keep synchronized across processes.kwargs – All other keyword arguments to pass to the regular
DataLoader
initialization.
BatchSamplerShard¶
-
class
accelerate.data_loader.
BatchSamplerShard
(*args, **kwds)[source]¶ Wraps a PyTorch
BatchSampler
to generate batches for one of the processes only. Instances of this class will always yield a number of batches that is a round multiple ofnum_processes
and that all have the same size. Depending on the value of thedrop_last
attribute of the batch sampler passed, it will either stop the iteration at the first batch that would be too small / not present on all processes or loop with indices from the beginning.- Parameters
batch_sampler (
torch.utils.data.sampler.BatchSampler
) – The batch sampler to split in several shards.num_processes (
int
, optional, defaults to 1) – The number of processes running concurrently.process_index (
int
, optional, defaults to 0) – The index of the current process.split_batches (
bool
, optional, defaults toFalse
) –Whether the shards should be created by splitting a batch to give a piece of it on each process, or by yielding different full batches on each process.
On two processes with a sampler of
[[0, 1, 2, 3], [4, 5, 6, 7]]
, this will result in:the sampler on process 0 to yield
[0, 1, 2, 3]
and the sampler on process 1 to yield[4, 5, 6, 7]
if this argument is set toFalse
.the sampler on process 0 to yield
[0, 1]
then[4, 5]
and the sampler on process 1 to yield[2, 3]
then[6, 7]
if this argument is set toTrue
.
Warning
This does not support
BatchSampler
with varying batch size yet.
IterableDatasetShard¶
-
class
accelerate.data_loader.
IterableDatasetShard
(*args, **kwds)[source]¶ Wraps a PyTorch
IterableDataset
to generate samples for one of the processes only. Instances of this class will always yield a number of samples that is a round multiple of the actual batch size (depending of the value ofsplit_batches
, this is eitherbatch_size
orbatch_size x num_processes
). Depending on the value of thedrop_last
attribute of the batch sampler passed, it will either stop the iteration at the first batch that would be too small or loop with indices from the beginning.- Parameters
dataset (
torch.utils.data.dataset.IterableDataset
) – The batch sampler to split in several shards.batch_size (
int
, optional, defaults to 1) – The size of the batches per shard (ifsplit_batches=False
) or the size of the batches (ifsplit_batches=True
).drop_last (
bool
, optional, defaults toFalse
) – Whether or not to drop the last incomplete batch or complete the last batches by using the samples from the beginning.num_processes (
int
, optional, defaults to 1) – The number of processes running concurrently.process_index (
int
, optional, defaults to 0) – The index of the current process.split_batches (
bool
, optional, defaults toFalse
) –Whether the shards should be created by splitting a batch to give a piece of it on each process, or by yielding different full batches on each process.
On two processes with an iterable dataset yielding of
[0, 1, 2, 3, 4, 5, 6, 7]
, this will result in:the shard on process 0 to yield
[0, 1, 2, 3]
and the shard on process 1 to yield[4, 5, 6, 7]
if this argument is set toFalse
.the shard on process 0 to yield
[0, 1, 4, 5]
and the sampler on process 1 to yield[2, 3, 6, 7]
if this argument is set toTrue
.
Distributed Config¶
AcceleratorState¶
-
class
accelerate.state.
AcceleratorState
(fp16: bool = None, cpu: bool = False, deepspeed_plugin=None, _from_accelerator: bool = False)[source]¶ This is a variation of a singleton class in the sense that all instance of
AcceleratorState
share the same state, which is initialized on the first instantiation.Attributes
device (
torch.device
) – The device to use.distributed_type (
DistributedType
) – The type of distributed environment currently in use.num_processes (
int
) – The number of processes currently launched in parallel.process_index (
int
) – The index of the current process.local_process_index (
int
) – The index of the current process on the current server.use_fp16 (
bool
) – Whether or not the current script will use mixed precision.
DistributedType¶
-
class
accelerate.state.
DistributedType
(value)[source]¶ Represents a type of distributed environment.
Values:
NO – Not a distributed environment, just a single process.
MULTI_CPU – Distributed on multiple CPU nodes.
MULTI_GPU – Distributed on multiple GPUs.
DEEPSPEED – Using DeepSpeed.
TPU – Distributed on TPUs.
Utilities¶
-
accelerate.utils.
extract_model_from_parallel
(model)[source]¶ Extract a model from its distributed containers.
- Parameters
model (
torch.nn.Module
) – The model to extract.- Returns
The extracted model.
- Return type
torch.nn.Module
-
accelerate.utils.
gather
(tensor)[source]¶ Recursively gather tensor in a nested list/tuple/dictionary of tensors from all devices.
- Parameters
tensor (nested list/tuple/dictionary of
torch.Tensor
) – The data to gather.- Returns
The same data structure as
tensor
with all tensors sent to the proper device.
-
accelerate.utils.
send_to_device
(tensor, device)[source]¶ Recursively sends the elements in a nested list/tuple/dictionary of tensors to a given device.
- Parameters
tensor (nested list/tuple/dictionary of
torch.Tensor
) – The data to send to a given device.device (
torch.device
) – The device to send the data to
- Returns
The same data structure as
tensor
with all tensors sent to the proper device.
-
accelerate.utils.
set_seed
(seed: int)[source]¶ Helper function for reproducible behavior to set the seed in
random
,numpy
,torch
.- Parameters
seed (
int
) – The seed to set.
-
accelerate.utils.
synchronize_rng_state
(rng_type: Optional[accelerate.utils.RNGType] = None, generator: Optional[torch._C.Generator] = None)[source]¶