Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
File size: 3,064 Bytes
e5cadf9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
##
Run `accelerate config` on and answer the questionnaire accordingly.
Below is an example yaml for running code remotely on AWS SageMaker. Replace placeholder `xxxxx` with
appropriate values.
<pre>
base_job_name: accelerate-sagemaker-1
compute_environment: AMAZON_SAGEMAKER
distributed_type: 'NO'
dynamo_backend: 'NO'
ec2_instance_type: ml.p3.2xlarge
gpu_ids: all
iam_role_name: xxxxx
mixed_precision: 'no'
num_machines: 1
profile: xxxxx
py_version: py38
pytorch_version: 1.10.2
region: us-east-1
transformers_version: 4.17.0
use_cpu: false
</pre>
##
<pre>
from accelerate import Accelerator
def parse_args():
parser = argparse.ArgumentParser(description="sample task")
parser.add_argument(
"--pad_to_max_length",
- action="store_true",
+ type=bool,
+ default=False,
help="If passed, pad all samples to `max_length`. Otherwise, dynamic padding is used.",
)
...
+ def main():
accelerator = Accelerator()
model, optimizer, training_dataloader, scheduler = accelerator.prepare(
model, optimizer, training_dataloader, scheduler
)
for batch in training_dataloader:
optimizer.zero_grad()
inputs, targets = batch
outputs = model(inputs)
loss = loss_function(outputs, targets)
accelerator.backward(loss)
optimizer.step()
scheduler.step()
- torch.save('/opt/ml/model`)
+ accelerator.save('/opt/ml/model')
+ if __name__ == "__main__":
+ main()
</pre>
Launching a script using default accelerate config file looks like the following:
```
accelerate launch {script_name.py} {--arg1} {--arg2} ...
```
##
SageMaker doesn’t support argparse actions. If you want to use, for example, boolean hyperparameters, you need to specify type as bool in your script and provide an explicit True or False value for this hyperparameter. An example for the same is shown above for `pad_to_max_length` argument. Another important point is to save all the output artifacts to `/opt/ml/model` or use `os.environ["SM_MODEL_DIR"]` as your save directory. After training, artifacts in this directory are uploaded to S3, an example is shown in above code snippet.
You can provide custom docker image, input channels pointing to S3 data locations and use SageMaker metrics logging
as part of advanced features. Please refer <a href="https://github.com/huggingface/notebooks/tree/main/sagemaker/22_accelerate_sagemaker_examples" target="_blank">Examples showcasing AWS SageMaker integration of 🤗 Accelerate</a>
##
To learn more checkout the related documentation:
- <a href="https://huggingface.co/docs/accelerate/usage_guides/sagemaker" target="_blank">How to use 🤗 Accelerate with SageMaker</a>
- <a href="https://github.com/huggingface/notebooks/tree/main/sagemaker/22_accelerate_sagemaker_examples" target="_blank">Examples showcasing AWS SageMaker integration of 🤗 Accelerate</a>
- <a href="https://huggingface.co/docs/accelerate/main/en/package_reference/cli" target="_blank">The Accelerate CLI</a> |