## Run `accelerate config` and answer the questionnaire accordingly. Below is an example yaml for mixed-precision training using DeepSpeed ZeRO Stage-3 with CPU offloading on 8 GPUs.
compute_environment: LOCAL_MACHINE
deepspeed_config:
  gradient_accumulation_steps: 1
  gradient_clipping: 1.0
  offload_optimizer_device: cpu
  offload_param_device: cpu
  zero3_init_flag: true
  zero3_save_16bit_model: true
  zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
machine_rank: 0
main_training_function: main
megatron_lm_config: {}
mixed_precision: fp16
num_machines: 1
num_processes: 8
rdzv_backend: static
same_network: true
use_cpu: false
##
  from accelerate import Accelerator
  
+ def main():
    accelerator = Accelerator()

    model, optimizer, training_dataloader, scheduler = accelerator.prepare(
        model, optimizer, training_dataloader, scheduler
    )

    for batch in training_dataloader:
        optimizer.zero_grad()
        inputs, targets = batch
        outputs = model(inputs)
        loss = loss_function(outputs, targets)
        accelerator.backward(loss)
        optimizer.step()
        scheduler.step()

    ...

    generated_tokens = accelerator.unwrap_model(model).generate(
                    batch["input_ids"],
                    attention_mask=batch["attention_mask"],
                    **gen_kwargs,
+                    synced_gpus=True #required for ZeRO Stage 3
                )
    ...

    accelerator.unwrap_model(model).save_pretrained(
            args.output_dir,
            is_main_process=accelerator.is_main_process,
            save_function=accelerator.save,
+            state_dict=accelerator.get_state_dict(model), #required for ZeRO Stage 3
        )

    ...

+ if __name__ == "__main__":
+     main()
Launching a script using default accelerate config file looks like the following: ``` accelerate launch {script_name.py} {--arg1} {--arg2} ... ``` Alternatively, you can use `accelerate launch` with right config params for multi-gpu training as shown below ``` accelerate launch \ --use_deepspeed \ --num_processes=8 \ --mixed_precision=fp16 \ --zero_stage=3 \ --gradient_accumulation_steps=1 \ --gradient_clipping=1 \ --zero3_init_flag=True \ --zero3_save_16bit_model=True \ --offload_optimizer_device=cpu \ --offload_param_device=cpu \ {script_name.py} {--arg1} {--arg2} ... ``` ## For core DeepSpeed features supported via accelerate config file, no changes are required for ZeRO Stages 1 and 2. For ZeRO Stage-3, transformers' `generate` function requires `synced_gpus=True` and `save_pretrained` requires the `state_dict` param due to the fact that model parameters are sharded across the GPUs. For advanced users who like granular control via DeepSpeed config file, it is supported wherein you can pass its loaction when running `accelerate config` command. You can also specify values of most of the fields in DeepSpeed config file as `auto` and they are filled automatically via the arguments of `accelerate launch` command and `accelerator.prepare` call thereby making life simple for users. Please refer docs on DeepSpeed Config File ## To learn more checkout the related documentation: - How to use DeepSpeed - Accelerate Large Model Training using DeepSpeed - DeepSpeed Utilities