File size: 3,064 Bytes
e5cadf9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
##
Run  `accelerate config` on and answer the questionnaire accordingly. 
Below is an example yaml for running code remotely on AWS SageMaker. Replace placeholder `xxxxx` with 
appropriate values.

<pre>
base_job_name: accelerate-sagemaker-1
compute_environment: AMAZON_SAGEMAKER
distributed_type: 'NO'
dynamo_backend: 'NO'
ec2_instance_type: ml.p3.2xlarge
gpu_ids: all
iam_role_name: xxxxx
mixed_precision: 'no'
num_machines: 1
profile: xxxxx
py_version: py38
pytorch_version: 1.10.2
region: us-east-1
transformers_version: 4.17.0
use_cpu: false
</pre>
##
<pre>
from accelerate import Accelerator

def parse_args():
    parser = argparse.ArgumentParser(description="sample task")

    parser.add_argument(
        "--pad_to_max_length",
-        action="store_true",
+        type=bool,
+        default=False,
        help="If passed, pad all samples to `max_length`. Otherwise, dynamic padding is used.",
    )

    ...

  
+ def main():
      accelerator = Accelerator()

      model, optimizer, training_dataloader, scheduler = accelerator.prepare(
          model, optimizer, training_dataloader, scheduler
      )

      for batch in training_dataloader:
          optimizer.zero_grad()
          inputs, targets = batch
          outputs = model(inputs)
          loss = loss_function(outputs, targets)
          accelerator.backward(loss)
          optimizer.step()
          scheduler.step()

-    torch.save('/opt/ml/model`)
+    accelerator.save('/opt/ml/model')

+ if __name__ == "__main__":
+     main()
</pre>
Launching a script using default accelerate config file looks like the following:
```
accelerate launch {script_name.py} {--arg1} {--arg2} ...
```
##
SageMaker doesn’t support argparse actions. If you want to use, for example, boolean hyperparameters, you need to specify type as bool in your script and provide an explicit True or False value for this hyperparameter. An example for the same is shown above for `pad_to_max_length` argument. Another important point is to save all the output artifacts to `/opt/ml/model` or use `os.environ["SM_MODEL_DIR"]` as your save directory. After training, artifacts in this directory are uploaded to S3, an example is shown in above code snippet.

You can provide custom docker image, input channels pointing to S3 data locations and use SageMaker metrics logging
as part of advanced features. Please refer <a href="https://github.com/huggingface/notebooks/tree/main/sagemaker/22_accelerate_sagemaker_examples" target="_blank">Examples showcasing AWS SageMaker integration of 🤗 Accelerate</a>

##
To learn more checkout the related documentation:
- <a href="https://huggingface.co/docs/accelerate/usage_guides/sagemaker" target="_blank">How to use 🤗 Accelerate with SageMaker</a>
- <a href="https://github.com/huggingface/notebooks/tree/main/sagemaker/22_accelerate_sagemaker_examples" target="_blank">Examples showcasing AWS SageMaker integration of 🤗 Accelerate</a>
- <a href="https://huggingface.co/docs/accelerate/main/en/package_reference/cli" target="_blank">The Accelerate CLI</a>