Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
| ## | |
| Below is an example yaml for mixed precision training using DeepSpeed ZeRO Stage-3 with CPU offloading on 8 GPUs. | |
| <pre> | |
| compute_environment: LOCAL_MACHINE | |
| +deepspeed_config: | |
| + gradient_accumulation_steps: 1 | |
| + gradient_clipping: 1.0 | |
| + offload_optimizer_device: cpu | |
| + offload_param_device: cpu | |
| + zero3_init_flag: true | |
| + zero3_save_16bit_model: true | |
| + zero_stage: 3 | |
| +distributed_type: DEEPSPEED | |
| downcast_bf16: 'no' | |
| dynamo_backend: 'NO' | |
| fsdp_config: {} | |
| machine_rank: 0 | |
| main_training_function: main | |
| megatron_lm_config: {} | |
| mixed_precision: fp16 | |
| +num_machines: 1 | |
| +num_processes: 8 | |
| rdzv_backend: static | |
| same_network: true | |
| use_cpu: false | |
| </pre> | |
| ## | |
| Assume that `model` is created utilizing the `transformers` library. | |
| <pre> | |
| from accelerate import Accelerator | |
| def main(): | |
| accelerator = Accelerator() | |
| model, optimizer, training_dataloader, scheduler = accelerator.prepare( | |
| model, optimizer, training_dataloader, scheduler | |
| ) | |
| generated_tokens = accelerator.unwrap_model(model).generate( | |
| batch["input_ids"], | |
| attention_mask=batch["attention_mask"], | |
| **gen_kwargs, | |
| + synced_gpus=True | |
| ) | |
| ... | |
| accelerator.unwrap_model(model).save_pretrained( | |
| args.output_dir, | |
| is_main_process=accelerator.is_main_process, | |
| save_function=accelerator.save, | |
| + state_dict=accelerator.get_state_dict(model) | |
| ) | |
| ... | |
| </pre> | |
| ## | |
| If the YAML was generated through the `accelerate config` command: | |
| ``` | |
| accelerate launch {script_name.py} {--arg1} {--arg2} ... | |
| ``` | |
| If the YAML is saved to a `~/config.yaml` file: | |
| ``` | |
| accelerate launch --config_file ~/config.yaml {script_name.py} {--arg1} {--arg2} ... | |
| ``` | |
| Or you can use `accelerate launch` with right configuration parameters and have no `config.yaml` file: | |
| ``` | |
| accelerate launch \ | |
| --use_deepspeed \ | |
| --num_processes=8 \ | |
| --mixed_precision=fp16 \ | |
| --zero_stage=3 \ | |
| --gradient_accumulation_steps=1 \ | |
| --gradient_clipping=1 \ | |
| --zero3_init_flag=True \ | |
| --zero3_save_16bit_model=True \ | |
| --offload_optimizer_device=cpu \ | |
| --offload_param_device=cpu \ | |
| {script_name.py} {--arg1} {--arg2} ... | |
| ``` | |
| ## | |
| For core DeepSpeed features (ZeRO stages 1 and 2), Accelerate requires no code changes. For ZeRO Stage-3, `transformers`' `generate` function requires `synced_gpus=True` and `save_pretrained` requires the `state_dict` param due to the fact that model parameters are sharded across the GPUs. | |
| You can also specify values of most of the fields in the `DeepSpeed` config file to `auto` and they will be automatically filled when performing `accelerate launch`. | |
| ## | |
| To learn more checkout the related documentation: | |
| - <a href="https://huggingface.co/docs/accelerate/usage_guides/deepspeed" target="_blank">How to use DeepSpeed</a> | |
| <a href="https://huggingface.co/docs/accelerate/usage_guides/deepspeed#deepspeed-config-file" target="_blank">DeepSpeed Config File</a> | |
| - <a href="https://huggingface.co/blog/accelerate-deepspeed" target="_blank">Accelerate Large Model Training using DeepSpeed</a> | |
| - <a href="https://huggingface.co/docs/accelerate/package_reference/deepspeed" target="_blank">DeepSpeed Utilities</a> |