{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "7d29e9ae",
"metadata": {},
"source": [
"---\n",
"title: \"Accelerate, Three Powerful Sublibraries for PyTorch\" \n",
"author: \"Zachary Mueller\"\n",
"format: \n",
" revealjs: \n",
" theme: moon \n",
" fig-format: png\n",
"categories: [Lesson 6]\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "d2aba289-d771-4be9-a4ec-99ab268c5586",
"metadata": {},
"source": [
"## What is 🤗 Accelerate?"
]
},
{
"cell_type": "markdown",
"id": "329b61de-d7c9-46d2-adff-7a912ba93356",
"metadata": {},
"source": [
"```{mermaid}\n",
"%%| fig-height: 6\n",
"graph LR\n",
" A{\"🤗 Accelerate#32;\"}\n",
" A --> B[\"Launching
Interface#32;\"]\n",
" A --> C[\"Training Library#32;\"]\n",
" A --> D[\"Big Model
Inference#32;\"]\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "0480b2df-a19c-4b93-b98a-b98da4d0d825",
"metadata": {},
"source": [
"# A Launching Interface\n",
"\n",
"Can't I just use `python do_the_thing.py`?"
]
},
{
"cell_type": "markdown",
"id": "c6d5b3da-aad3-4387-b9f3-65384b521bb9",
"metadata": {},
"source": [
"## A Launching Interface\n",
"\n",
"Launching scripts in different environments is complicated:"
]
},
{
"cell_type": "markdown",
"id": "d2c2079d-ab7d-4e98-94a9-d16093b81ea6",
"metadata": {},
"source": [
"- ```bash \n",
"python script.py\n",
"```\n",
"\n",
"- ```bash \n",
"torchrun --nnodes=1 --nproc_per_node=2 script.py\n",
"```\n",
"\n",
"- ```bash \n",
"deepspeed --num_gpus=2 script.py\n",
"```\n",
"\n",
"And more!"
]
},
{
"cell_type": "markdown",
"id": "77bdbbaa-acaa-4ed3-b809-82e836db93f7",
"metadata": {},
"source": [
"## A Launching Interface\n",
"\n",
"But it doesn't have to be:"
]
},
{
"cell_type": "markdown",
"id": "21456afb-7ae6-4bb6-81ea-e6de6365c13f",
"metadata": {},
"source": [
"```bash\n",
"accelerate launch script.py\n",
"```\n",
"\n",
"A single command to launch with `DeepSpeed`, Fully Sharded Data Parallelism, across single and multi CPUs and GPUs, and to train on TPUs[^1] too! \n",
"\n",
"[^1]: Without needing to modify your code and create a `_mp_fn`"
]
},
{
"cell_type": "markdown",
"id": "d77a2576-5bb4-4a64-bcbf-1be9af3a232b",
"metadata": {},
"source": [
"## A Launching Interface\n",
"\n",
"Generate a device-specific configuration through `accelerate config`\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"id": "05c4fe0c-7b86-49a0-bd83-b9dab39e406f",
"metadata": {},
"source": [
"## A Launching Interface\n",
"\n",
"Or don't. `accelerate config` doesn't *have* to be done!\n",
"\n",
"```bash\n",
"torchrun --nnodes=1 --nproc_per_node=2 script.py\n",
"accelerate launch --multi_gpu --nproc_per_node=2 script.py\n",
"```\n",
"\n",
"A quick default configuration can be made too:\n",
"\n",
"```bash \n",
"accelerate config default\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "c6047a52-3582-41c4-96e4-370ef269be94",
"metadata": {},
"source": [
"## A Launching Interface\n",
"\n",
"With the `notebook_launcher` it's also possible to launch code directly from your Jupyter environment too!"
]
},
{
"cell_type": "markdown",
"id": "1097c474-2ec5-4214-9477-ad6bac25317a",
"metadata": {},
"source": [
"```python\n",
"from accelerate import notebook_launcher\n",
"notebook_launcher(\n",
" training_loop_function, \n",
" args, \n",
" num_processes=2\n",
")\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "d8ac070e-bb81-4624-b826-8cb072646ea7",
"metadata": {},
"source": [
"```python\n",
"Launching training on 2 GPUs.\n",
"epoch 0: 88.12\n",
"epoch 1: 91.73\n",
"epoch 2: 92.58\n",
"epoch 3: 93.90\n",
"epoch 4: 94.71\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "debe8ec0-4078-4f85-835e-f38c102ddfaf",
"metadata": {},
"source": [
"# A Training Library\n",
"\n",
"Okay, will `accelerate launch` make `do_the_thing.py` use all my GPUs magically?"
]
},
{
"cell_type": "markdown",
"id": "7c8d4a16-7b57-4eee-8974-e39ff459c5e5",
"metadata": {},
"source": [
"## A Training Library\n",
"\n",
"- Just showed that its possible using `accelerate launch` to *launch* a python script in various distributed environments\n",
"- This does *not* mean that the script will just \"use\" that code and still run on the new compute efficiently.\n",
"- Training on different computes often means *many* lines of code changed for each specific compute.\n",
"- 🤗 `accelerate` solves this by ensuring the same code can be ran on a CPU or GPU, multiples, and on TPUs!"
]
},
{
"cell_type": "markdown",
"id": "d8f7dfdd-5af5-4f6a-831b-d4e0a8f312d3",
"metadata": {},
"source": [
"## A Training Library\n",
"\n",
"\n",
"```{.python}\n",
"for batch in dataloader:\n",
" optimizer.zero_grad()\n",
" inputs, targets = batch\n",
" inputs = inputs.to(device)\n",
" targets = targets.to(device)\n",
" outputs = model(inputs)\n",
" loss = loss_function(outputs, targets)\n",
" loss.backward()\n",
" optimizer.step()\n",
" scheduler.step()\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "2c13ef82-d4c4-4564-8d3d-ef4c4ad3c9d8",
"metadata": {},
"source": [
"## A Training Library {.smaller}"
]
},
{
"cell_type": "markdown",
"id": "992909f7-8f5c-4138-8f31-94f305f564de",
"metadata": {},
"source": [
":::: {.columns}\n",
"::: {.column width=\"43%\"}\n",
"
\n",
"```{.python code-line-numbers=\"5-6,9\"}\n",
"# For alignment purposes\n",
"for batch in dataloader:\n",
" optimizer.zero_grad()\n",
" inputs, targets = batch\n",
" inputs = inputs.to(device)\n",
" targets = targets.to(device)\n",
" outputs = model(inputs)\n",
" loss = loss_function(outputs, targets)\n",
" loss.backward()\n",
" optimizer.step()\n",
" scheduler.step()\n",
"```\n",
":::\n",
"::: {.column width=\"57%\"}\n",
"```{.python code-line-numbers=\"1-7,12-13,16\"}\n",
"from accelerate import Accelerator\n",
"accelerator = Accelerator()\n",
"dataloader, model, optimizer scheduler = (\n",
" accelerator.prepare(\n",
" dataloader, model, optimizer, scheduler\n",
" )\n",
")\n",
"\n",
"for batch in dataloader:\n",
" optimizer.zero_grad()\n",
" inputs, targets = batch\n",
" # inputs = inputs.to(device)\n",
" # targets = targets.to(device)\n",
" outputs = model(inputs)\n",
" loss = loss_function(outputs, targets)\n",
" accelerator.backward(loss) # loss.backward()\n",
" optimizer.step()\n",
" scheduler.step()\n",
"```\n",
":::\n",
"\n",
"::::"
]
},
{
"cell_type": "markdown",
"id": "4028a9a1-5c25-41a3-9c76-f116d6fbb1db",
"metadata": {},
"source": [
"## A Training Library\n",
"\n",
"What all happened in `Accelerator.prepare`?\n",
"\n",
"::: {.incremental}\n",
"1. `Accelerator` looked at the configuration\n",
"2. The `dataloader` was converted into one that can dispatch each batch onto a seperate GPU\n",
"3. The `model` was wrapped with the appropriate DDP wrapper from either `torch.distributed` or `torch_xla`\n",
"4. The `optimizer` and `scheduler` were both converted into an `AcceleratedOptimizer` and `AcceleratedScheduler` which knows how to handle any distributed scenario\n",
":::"
]
},
{
"cell_type": "markdown",
"id": "92e112c3-cdf9-4d84-8076-df33a79da641",
"metadata": {},
"source": [
"## Let's bring in `fastai`\n",
"\n",
"To utilize the `notebook_launcher` and `accelerate` at once it requires a few steps:\n",
"\n",
"1. Migrate the `DataLoaders` creation to inside the `train` function\n",
"2. Use the `distrib_ctx` context manager fastai provides\n",
"3. Train!"
]
},
{
"cell_type": "markdown",
"id": "ba04e9b9-4589-4a08-adc3-cb2b4ec6ad43",
"metadata": {},
"source": [
"## Let's bring `fastai`\n",
"\n",
"Here it is in code, based on the [distributed app examples](https://docs.fast.ai/examples/distributed_app_examples.html)\n",
"\n",
"```{.python}\n",
"from fastai.vision.all import *\n",
"from fastai.distributed import *\n",
"\n",
"path = untar_data(URLs.PETS)/'images'\n",
"\n",
"def train():\n",
" dls = ImageDataLoaders.from_name_func(\n",
" path, get_image_files(path), valid_pct=0.2,\n",
" label_func=lambda x: x[0].isupper(), item_tfms=Resize(224))\n",
" learn = vision_learner(dls, resnet34, metrics=error_rate).to_fp16()\n",
" with learn.distrib_ctx(in_notebook=True, sync_bn=False):\n",
" learn.fine_tune(1)\n",
"\n",
"notebook_launcher(train, num_processes=2)\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "95c138db-6ef1-4c20-ba76-5040deca83e1",
"metadata": {},
"source": [
"## Let's bring `fastai`\n",
"\n",
"Here it is in code, based on the [distributed app examples](https://docs.fast.ai/examples/distributed_app_examples.html)\n",
"\n",
"```{.python code-line-numbers=\"1,5,10,13\"}\n",
"from fastai.vision.all import *\n",
"from fastai.distributed import *\n",
"\n",
"path = untar_data(URLs.PETS)/'images'\n",
"\n",
"def train():\n",
" dls = ImageDataLoaders.from_name_func(\n",
" path, get_image_files(path), valid_pct=0.2,\n",
" label_func=lambda x: x[0].isupper(), item_tfms=Resize(224))\n",
" learn = vision_learner(dls, resnet34, metrics=error_rate).to_fp16()\n",
" with learn.distrib_ctx(in_notebook=True, sync_bn=False):\n",
" learn.fine_tune(1)\n",
"\n",
"notebook_launcher(train, num_processes=2)\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "d4e4a37d-8044-4b0b-91a9-ca8ec8d54895",
"metadata": {},
"source": [
"## Let's bring `fastai`\n",
"\n",
"The key important parts to remember are:\n",
"\n",
"- **No** code should *touch* the GPU before calling `notebook_launcher`\n",
"- Generally it's recommended to let fastai handle gradient accumulation and mixed precision in this case, so use their in-house Callbacks\n",
"- Use the `notebook_launcher` to run the training function after everything is complete."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10 (default, Nov 14 2022, 12:59:47) \n[GCC 9.4.0]"
},
"vscode": {
"interpreter": {
"hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}