Spaces:

hf-accelerate
/

accelerate_examples

Running on CPU Upgrade

App Files Files Community

accelerate_examples / code_samples /training_configuration /deepspeed

muellerzr

Refactor

06a60a3 over 2 years ago

raw

history blame

3.18 kB

	##
	Below is an example yaml for mixed precision training using DeepSpeed ZeRO Stage-3 with CPU offloading on 8 GPUs.
	<pre>
	compute_environment: LOCAL_MACHINE
	+deepspeed_config:
	+ gradient_accumulation_steps: 1
	+ gradient_clipping: 1.0
	+ offload_optimizer_device: cpu
	+ offload_param_device: cpu
	+ zero3_init_flag: true
	+ zero3_save_16bit_model: true
	+ zero_stage: 3
	+distributed_type: DEEPSPEED
	downcast_bf16: 'no'
	dynamo_backend: 'NO'
	fsdp_config: {}
	machine_rank: 0
	main_training_function: main
	megatron_lm_config: {}
	mixed_precision: fp16
	+num_machines: 1
	+num_processes: 8
	rdzv_backend: static
	same_network: true
	use_cpu: false
	</pre>
	##
	Assume that `model` is created utilizing the `transformers` library.
	<pre>
	from accelerate import Accelerator

	def main():
	accelerator = Accelerator()

	model, optimizer, training_dataloader, scheduler = accelerator.prepare(
	model, optimizer, training_dataloader, scheduler
	)

	generated_tokens = accelerator.unwrap_model(model).generate(
	batch["input_ids"],
	attention_mask=batch["attention_mask"],
	**gen_kwargs,
	+ synced_gpus=True
	)
	...

	accelerator.unwrap_model(model).save_pretrained(
	args.output_dir,
	is_main_process=accelerator.is_main_process,
	save_function=accelerator.save,
	+ state_dict=accelerator.get_state_dict(model)
	)
	...
	</pre>
	##
	If the YAML was generated through the `accelerate config` command:
	```
	accelerate launch {script_name.py} {--arg1} {--arg2} ...
	```

	If the YAML is saved to a `~/config.yaml` file:
	```
	accelerate launch --config_file ~/config.yaml {script_name.py} {--arg1} {--arg2} ...
	```

	Or you can use `accelerate launch` with right configuration parameters and have no `config.yaml` file:
	```
	accelerate launch \
	--use_deepspeed \
	--num_processes=8 \
	--mixed_precision=fp16 \
	--zero_stage=3 \
	--gradient_accumulation_steps=1 \
	--gradient_clipping=1 \
	--zero3_init_flag=True \
	--zero3_save_16bit_model=True \
	--offload_optimizer_device=cpu \
	--offload_param_device=cpu \
	{script_name.py} {--arg1} {--arg2} ...
	```

	##
	For core DeepSpeed features (ZeRO stages 1 and 2), Accelerate requires no code changes. For ZeRO Stage-3, `transformers`' `generate` function requires `synced_gpus=True` and `save_pretrained` requires the `state_dict` param due to the fact that model parameters are sharded across the GPUs.

	You can also specify values of most of the fields in the `DeepSpeed` config file to `auto` and they will be automatically filled when performing `accelerate launch`.
	##
	To learn more checkout the related documentation:
	- <a href="https://huggingface.co/docs/accelerate/usage_guides/deepspeed" target="_blank">How to use DeepSpeed</a>
	<a href="https://huggingface.co/docs/accelerate/usage_guides/deepspeed#deepspeed-config-file" target="_blank">DeepSpeed Config File</a>
	- <a href="https://huggingface.co/blog/accelerate-deepspeed" target="_blank">Accelerate Large Model Training using DeepSpeed</a>
	- <a href="https://huggingface.co/docs/accelerate/package_reference/deepspeed" target="_blank">DeepSpeed Utilities</a>