Accelerate documentation
The Command Line
The Command Line
Below is a list of all the available commands π€ Accelerate with their parameters
accelerate config
Command:
accelerate config
or accelerate-config
Launches a series of prompts to create and save a default_config.yml
configuration file for your training system. Should
always be ran first on your machine.
Usage:
accelerate config [arguments]
Optional Arguments:
--config_file CONFIG_FILE
(str
) β The path to use to store the config file. Will default to a file named default_config.yaml in the cache location, which is the content of the environmentHF_HOME
suffixed with βaccelerateβ, or if you donβt have such an environment variable, your cache directory (~/.cache
or the content ofXDG_CACHE_HOME
) suffixed withhuggingface
.-h
,--help
(bool
) β Show a help message and exit
accelerate env
Command:
accelerate env
or accelerate-env
Lists the contents of the passed π€ Accelerate configuration file. Should always be used when opening an issue on the GitHub repository.
Usage:
accelerate env [arguments]
Optional Arguments:
--config_file CONFIG_FILE
(str
) β The path to use to store the config file. Will default to a file named default_config.yaml in the cache location, which is the content of the environmentHF_HOME
suffixed with βaccelerateβ, or if you donβt have such an environment variable, your cache directory (~/.cache
or the content ofXDG_CACHE_HOME
) suffixed withhuggingface
.-h
,--help
(bool
) β Show a help message and exit
accelerate launch
Command:
accelerate launch
or accelerate-launch
Launches a specified script on a distributed system with the right parameters.
Usage:
accelerate launch [arguments] {training_script} --{training_script-argument-1} --{training_script-argument-2} ...
Positional Arguments:
{training_script}
β The full path to the script to be launched in parallel--{training_script-argument-1}
β Arguments of the training script
Optional Arguments:
-h
,--help
(bool
) β Show a help message and exit--config_file CONFIG_FILE
(str
)β The config file to use for the default values in the launching script.-m
,--module
(bool
) β Change each process to interpret the launch script as a Python module, executing with the same behavior as βpython -mβ.--no_python
(bool
) β Skip prepending the training script with βpythonβ - just execute it directly. Useful when the script is not a Python script.--debug
(bool
) β Whether to print out the torch.distributed stack trace when something fails.
The rest of these arguments are configured through accelerate config
and are read in from the specified --config_file
(or default configuration) for their
values. They can also be passed in manually.
Hardware Selection Arguments:
--cpu
(bool
) β Whether or not to force the training on the CPU.--multi_gpu
(bool
) β Whether or not this should launch a distributed GPU training.--mps
(bool
) β Whether or not this should use MPS-enabled GPU device on MacOS machines.--tpu
(bool
) β Whether or not this should launch a TPU training.
Resource Selection Arguments:
The following arguments are useful for fine-tuning how available hardware should be used
--mixed_precision {no,fp16,bf16}
(str
) β Whether or not to use mixed precision training. Choose between FP16 and BF16 (bfloat16) training. BF16 training is only supported on Nvidia Ampere GPUs and PyTorch 1.10 or later.--num_processes NUM_PROCESSES
(int
) β The total number of processes to be launched in parallel.--num_machines NUM_MACHINES
(int
) β The total number of machines used in this training.--num_cpu_threads_per_process NUM_CPU_THREADS_PER_PROCESS
(int
) β The number of CPU threads per process. Can be tuned for optimal performance.
Training Paradigm Arguments:
The following arguments are useful for selecting which training paradigm to use.
--use_deepspeed
(bool
) β Whether or not to use DeepSpeed for training.--use_fsdp
(bool
) β Whether or not to use FullyShardedDataParallel for training.--use_megatron_lm
(bool
) β Whether or not to use Megatron-LM for training.
Distributed GPU Arguments:
The following arguments are only useful when multi_gpu
is passed or multi-gpu training is configured through accelerate config
:
--gpu_ids
(str
) β What GPUs (by id) should be used for training on this machine as a comma-seperated list--same_network
(bool
) β Whether all machines used for multinode training exist on the same local network.--machine_rank MACHINE_RANK
(int
) β The rank of the machine on which this script is launched.--main_process_ip MAIN_PROCESS_IP
(str
) β The IP address of the machine of rank 0.--main_process_port MAIN_PROCESS_PORT
(int
) β The port to use to communicate with the machine of rank 0.--rdzv_conf
(str
) β Additional rendezvous configuration (<key1>=<value1>,<key2>=<value2>,β¦).--max_restarts
(int
) β Maximum number of worker group restarts before failing.--monitor_interval
(float
) β Interval, in seconds, to monitor the state of workers.
TPU Arguments:
The following arguments are only useful when tpu
is passed or TPU training is configured through accelerate config
:
--main_training_function MAIN_TRAINING_FUNCTION
(str
) β The name of the main function to be executed in your script.--downcast_bf16
(bool
) β Whether when using bf16 precision on TPUs if both float and double tensors are cast to bfloat16 or if double tensors remain as float32.
DeepSpeed Arguments:
The following arguments are only useful when use_deepspeed
is passed or deepspeed
is configured through accelerate config
:
--deepspeed_config_file
(str
) β DeepSpeed config file.--zero_stage
(int
) β DeepSpeedβs ZeRO optimization stage.--offload_optimizer_device
(str
) β Decides where (none|cpu|nvme) to offload optimizer states.--offload_param_device
(str
) β Decides where (none|cpu|nvme) to offload parameters.--gradient_accumulation_steps
(int
) β No of gradient_accumulation_steps used in your training script.--gradient_clipping
(float
) β Gradient clipping value used in your training script.--zero3_init_flag
(str
) β Decides Whether (true|false) to enabledeepspeed.zero.Init
for constructing massive models. Only applicable with DeepSpeed ZeRO Stage-3.--zero3_save_16bit_model
(str
) β Decides Whether (true|false) to save 16-bit model weights when using ZeRO Stage-3. Only applicable with DeepSpeed ZeRO Stage-3.--deepspeed_hostfile
(str
) β DeepSpeed hostfile for configuring multi-node compute resources.--deepspeed_exclusion_filter
(str
) β DeepSpeed exclusion filter string when using mutli-node setup.--deepspeed_inclusion_filter
(str
) β DeepSpeed inclusion filter string when using mutli-node setup.--deepspeed_multinode_launcher
(str
) β DeepSpeed multi-node launcher to use.
Fully Sharded Data Parallelism Arguments:
The following arguments are only useful when use_fdsp
is passed or Fully Sharded Data Parallelism is configured through accelerate config
:
--fsdp_offload_params
(str
) β Decides Whether (true|false) to offload parameters and gradients to CPU.--fsdp_min_num_params
(int
) β FSDPβs minimum number of parameters for Default Auto Wrapping.--fsdp_sharding_strategy
(int
) β FSDPβs Sharding Strategy.--fsdp_auto_wrap_policy
(str
) β FSDPβs auto wrap policy.--fsdp_transformer_layer_cls_to_wrap
(str
) β Transformer layer class name (case-sensitive) to wrap, e.g,BertLayer
,GPTJBlock
,T5Block
β¦--fsdp_backward_prefetch_policy
(str
) β FSDPβs backward prefetch policy.--fsdp_state_dict_type
(str
) β FSDPβs state dict type.
Megatron-LM Arguments:
The following arguments are only useful when use_megatron_lm
is passed or Megatron-LM is configured through accelerate config
:
--megatron_lm_tp_degree
(β) β Megatron-LMβs Tensor Parallelism (TP) degree.--megatron_lm_pp_degree
(β) β Megatron-LMβs Pipeline Parallelism (PP) degree.--megatron_lm_num_micro_batches
(β) β Megatron-LMβs number of micro batches when PP degree > 1.--megatron_lm_sequence_parallelism
(β) β Decides Whether (true|false) to enable Sequence Parallelism when TP degree > 1.--megatron_lm_recompute_activations
(β) β Decides Whether (true|false) to enable Selective Activation Recomputation.--megatron_lm_use_distributed_optimizer
(β) β Decides Whether (true|false) to use distributed optimizer which shards optimizer state and gradients across Data Pralellel (DP) ranks.--megatron_lm_gradient_clipping
(β) β Megatron-LMβs gradient clipping value based on global L2 Norm (0 to disable).
AWS SageMaker Arguments:
The following arguments are only useful when training in SageMaker
--aws_access_key_id AWS_ACCESS_KEY_ID
(str
) β The AWS_ACCESS_KEY_ID used to launch the Amazon SageMaker training job--aws_secret_access_key AWS_SECRET_ACCESS_KEY
(str
) β The AWS_SECRET_ACCESS_KEY used to launch the Amazon SageMaker training job
accelerate tpu-config
accelerate tpu-config
Usage:
accelerate tpu-config [arguments]
Optional Arguments:
-h
,--help
(bool
) β Show a help message and exit
Config Arguments:
Arguments that can be configured through accelerate config
.
--config_file
(str
) β Path to the config file to use for accelerate.--tpu_name
(str
) β The name of the TPU to use. If not specified, will use the TPU specified in the config file.--tpu_zone
(str
) β The zone of the TPU to use. If not specified, will use the zone specified in the config file.
TPU Arguments:
Arguments for options ran inside the TPU.
--command_file
(str
) β The path to the file containing the commands to run on the pod on startup.--command
(str
) β A command to run on the pod. Can be passed multiple times.--install_accelerate
(bool
) β Whether to install accelerate on the pod. Defaults to False.--accelerate_version
(str
) β The version of accelerate to install on the pod. If not specified, will use the latest pypi version. Specify βdevβ to install from GitHub.--debug
(bool
) β If set, will print the command that would be run instead of running it.
accelerate test
accelerate test
or accelerate-test
Runs accelerate/test_utils/test_script.py
to verify that π€ Accelerate has been properly configured on your system and runs.
Usage:
accelerate test [arguments]
Optional Arguments:
--config_file CONFIG_FILE
(str
) β The path to use to store the config file. Will default to a file named default_config.yaml in the cache location, which is the content of the environmentHF_HOME
suffixed with βaccelerateβ, or if you donβt have such an environment variable, your cache directory (~/.cache
or the content ofXDG_CACHE_HOME
) suffixed withhuggingface
.-h
,--help
(bool
) β Show a help message and exit