{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "7d29e9ae", "metadata": {}, "source": [ "---\n", "title: \"Accelerate, Three Powerful Sublibraries for PyTorch\" \n", "author: \"Zachary Mueller\"\n", "format: \n", " revealjs: \n", " theme: moon \n", " fig-format: png\n", "categories: [Lesson 6]\n", "---" ] }, { "cell_type": "markdown", "id": "d2aba289-d771-4be9-a4ec-99ab268c5586", "metadata": {}, "source": [ "## What is 🤗 Accelerate?" ] }, { "cell_type": "markdown", "id": "329b61de-d7c9-46d2-adff-7a912ba93356", "metadata": {}, "source": [ "```{mermaid}\n", "%%| fig-height: 6\n", "graph LR\n", " A{\"🤗 Accelerate#32;\"}\n", " A --> B[\"Launching
Interface#32;\"]\n", " A --> C[\"Training Library#32;\"]\n", " A --> D[\"Big Model
Inference#32;\"]\n", "```" ] }, { "cell_type": "markdown", "id": "0480b2df-a19c-4b93-b98a-b98da4d0d825", "metadata": {}, "source": [ "# A Launching Interface\n", "\n", "Can't I just use `python do_the_thing.py`?" ] }, { "cell_type": "markdown", "id": "c6d5b3da-aad3-4387-b9f3-65384b521bb9", "metadata": {}, "source": [ "## A Launching Interface\n", "\n", "Launching scripts in different environments is complicated:" ] }, { "cell_type": "markdown", "id": "d2c2079d-ab7d-4e98-94a9-d16093b81ea6", "metadata": {}, "source": [ "- ```bash \n", "python script.py\n", "```\n", "\n", "- ```bash \n", "torchrun --nnodes=1 --nproc_per_node=2 script.py\n", "```\n", "\n", "- ```bash \n", "deepspeed --num_gpus=2 script.py\n", "```\n", "\n", "And more!" ] }, { "cell_type": "markdown", "id": "77bdbbaa-acaa-4ed3-b809-82e836db93f7", "metadata": {}, "source": [ "## A Launching Interface\n", "\n", "But it doesn't have to be:" ] }, { "cell_type": "markdown", "id": "21456afb-7ae6-4bb6-81ea-e6de6365c13f", "metadata": {}, "source": [ "```bash\n", "accelerate launch script.py\n", "```\n", "\n", "A single command to launch with `DeepSpeed`, Fully Sharded Data Parallelism, across single and multi CPUs and GPUs, and to train on TPUs[^1] too! \n", "\n", "[^1]: Without needing to modify your code and create a `_mp_fn`" ] }, { "cell_type": "markdown", "id": "d77a2576-5bb4-4a64-bcbf-1be9af3a232b", "metadata": {}, "source": [ "## A Launching Interface\n", "\n", "Generate a device-specific configuration through `accelerate config`\n", "\n", "![](images/CLI.gif)" ] }, { "cell_type": "markdown", "id": "05c4fe0c-7b86-49a0-bd83-b9dab39e406f", "metadata": {}, "source": [ "## A Launching Interface\n", "\n", "Or don't. `accelerate config` doesn't *have* to be done!\n", "\n", "```bash\n", "torchrun --nnodes=1 --nproc_per_node=2 script.py\n", "accelerate launch --multi_gpu --nproc_per_node=2 script.py\n", "```\n", "\n", "A quick default configuration can be made too:\n", "\n", "```bash \n", "accelerate config default\n", "```" ] }, { "cell_type": "markdown", "id": "c6047a52-3582-41c4-96e4-370ef269be94", "metadata": {}, "source": [ "## A Launching Interface\n", "\n", "With the `notebook_launcher` it's also possible to launch code directly from your Jupyter environment too!" ] }, { "cell_type": "markdown", "id": "1097c474-2ec5-4214-9477-ad6bac25317a", "metadata": {}, "source": [ "```python\n", "from accelerate import notebook_launcher\n", "notebook_launcher(\n", " training_loop_function, \n", " args, \n", " num_processes=2\n", ")\n", "```" ] }, { "cell_type": "markdown", "id": "d8ac070e-bb81-4624-b826-8cb072646ea7", "metadata": {}, "source": [ "```python\n", "Launching training on 2 GPUs.\n", "epoch 0: 88.12\n", "epoch 1: 91.73\n", "epoch 2: 92.58\n", "epoch 3: 93.90\n", "epoch 4: 94.71\n", "```" ] }, { "cell_type": "markdown", "id": "debe8ec0-4078-4f85-835e-f38c102ddfaf", "metadata": {}, "source": [ "# A Training Library\n", "\n", "Okay, will `accelerate launch` make `do_the_thing.py` use all my GPUs magically?" ] }, { "cell_type": "markdown", "id": "7c8d4a16-7b57-4eee-8974-e39ff459c5e5", "metadata": {}, "source": [ "## A Training Library\n", "\n", "- Just showed that its possible using `accelerate launch` to *launch* a python script in various distributed environments\n", "- This does *not* mean that the script will just \"use\" that code and still run on the new compute efficiently.\n", "- Training on different computes often means *many* lines of code changed for each specific compute.\n", "- 🤗 `accelerate` solves this by ensuring the same code can be ran on a CPU or GPU, multiples, and on TPUs!" ] }, { "cell_type": "markdown", "id": "d8f7dfdd-5af5-4f6a-831b-d4e0a8f312d3", "metadata": {}, "source": [ "## A Training Library\n", "\n", "\n", "```{.python}\n", "for batch in dataloader:\n", " optimizer.zero_grad()\n", " inputs, targets = batch\n", " inputs = inputs.to(device)\n", " targets = targets.to(device)\n", " outputs = model(inputs)\n", " loss = loss_function(outputs, targets)\n", " loss.backward()\n", " optimizer.step()\n", " scheduler.step()\n", "```" ] }, { "cell_type": "markdown", "id": "2c13ef82-d4c4-4564-8d3d-ef4c4ad3c9d8", "metadata": {}, "source": [ "## A Training Library {.smaller}" ] }, { "cell_type": "markdown", "id": "992909f7-8f5c-4138-8f31-94f305f564de", "metadata": {}, "source": [ ":::: {.columns}\n", "::: {.column width=\"43%\"}\n", "


\n", "```{.python code-line-numbers=\"5-6,9\"}\n", "# For alignment purposes\n", "for batch in dataloader:\n", " optimizer.zero_grad()\n", " inputs, targets = batch\n", " inputs = inputs.to(device)\n", " targets = targets.to(device)\n", " outputs = model(inputs)\n", " loss = loss_function(outputs, targets)\n", " loss.backward()\n", " optimizer.step()\n", " scheduler.step()\n", "```\n", ":::\n", "::: {.column width=\"57%\"}\n", "```{.python code-line-numbers=\"1-7,12-13,16\"}\n", "from accelerate import Accelerator\n", "accelerator = Accelerator()\n", "dataloader, model, optimizer scheduler = (\n", " accelerator.prepare(\n", " dataloader, model, optimizer, scheduler\n", " )\n", ")\n", "\n", "for batch in dataloader:\n", " optimizer.zero_grad()\n", " inputs, targets = batch\n", " # inputs = inputs.to(device)\n", " # targets = targets.to(device)\n", " outputs = model(inputs)\n", " loss = loss_function(outputs, targets)\n", " accelerator.backward(loss) # loss.backward()\n", " optimizer.step()\n", " scheduler.step()\n", "```\n", ":::\n", "\n", "::::" ] }, { "cell_type": "markdown", "id": "4028a9a1-5c25-41a3-9c76-f116d6fbb1db", "metadata": {}, "source": [ "## A Training Library\n", "\n", "What all happened in `Accelerator.prepare`?\n", "\n", "::: {.incremental}\n", "1. `Accelerator` looked at the configuration\n", "2. The `dataloader` was converted into one that can dispatch each batch onto a seperate GPU\n", "3. The `model` was wrapped with the appropriate DDP wrapper from either `torch.distributed` or `torch_xla`\n", "4. The `optimizer` and `scheduler` were both converted into an `AcceleratedOptimizer` and `AcceleratedScheduler` which knows how to handle any distributed scenario\n", ":::" ] }, { "cell_type": "markdown", "id": "92e112c3-cdf9-4d84-8076-df33a79da641", "metadata": {}, "source": [ "## Let's bring in `fastai`\n", "\n", "To utilize the `notebook_launcher` and `accelerate` at once it requires a few steps:\n", "\n", "1. Migrate the `DataLoaders` creation to inside the `train` function\n", "2. Use the `distrib_ctx` context manager fastai provides\n", "3. Train!" ] }, { "cell_type": "markdown", "id": "ba04e9b9-4589-4a08-adc3-cb2b4ec6ad43", "metadata": {}, "source": [ "## Let's bring `fastai`\n", "\n", "Here it is in code, based on the [distributed app examples](https://docs.fast.ai/examples/distributed_app_examples.html)\n", "\n", "```{.python}\n", "from fastai.vision.all import *\n", "from fastai.distributed import *\n", "\n", "path = untar_data(URLs.PETS)/'images'\n", "\n", "def train():\n", " dls = ImageDataLoaders.from_name_func(\n", " path, get_image_files(path), valid_pct=0.2,\n", " label_func=lambda x: x[0].isupper(), item_tfms=Resize(224))\n", " learn = vision_learner(dls, resnet34, metrics=error_rate).to_fp16()\n", " with learn.distrib_ctx(in_notebook=True, sync_bn=False):\n", " learn.fine_tune(1)\n", "\n", "notebook_launcher(train, num_processes=2)\n", "```" ] }, { "cell_type": "markdown", "id": "95c138db-6ef1-4c20-ba76-5040deca83e1", "metadata": {}, "source": [ "## Let's bring `fastai`\n", "\n", "Here it is in code, based on the [distributed app examples](https://docs.fast.ai/examples/distributed_app_examples.html)\n", "\n", "```{.python code-line-numbers=\"1,5,10,13\"}\n", "from fastai.vision.all import *\n", "from fastai.distributed import *\n", "\n", "path = untar_data(URLs.PETS)/'images'\n", "\n", "def train():\n", " dls = ImageDataLoaders.from_name_func(\n", " path, get_image_files(path), valid_pct=0.2,\n", " label_func=lambda x: x[0].isupper(), item_tfms=Resize(224))\n", " learn = vision_learner(dls, resnet34, metrics=error_rate).to_fp16()\n", " with learn.distrib_ctx(in_notebook=True, sync_bn=False):\n", " learn.fine_tune(1)\n", "\n", "notebook_launcher(train, num_processes=2)\n", "```" ] }, { "cell_type": "markdown", "id": "d4e4a37d-8044-4b0b-91a9-ca8ec8d54895", "metadata": {}, "source": [ "## Let's bring `fastai`\n", "\n", "The key important parts to remember are:\n", "\n", "- **No** code should *touch* the GPU before calling `notebook_launcher`\n", "- Generally it's recommended to let fastai handle gradient accumulation and mixed precision in this case, so use their in-house Callbacks\n", "- Use the `notebook_launcher` to run the training function after everything is complete." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10 (default, Nov 14 2022, 12:59:47) \n[GCC 9.4.0]" }, "vscode": { "interpreter": { "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1" } } }, "nbformat": 4, "nbformat_minor": 5 }