{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "7d29e9ae",
   "metadata": {},
   "source": [
    "---\n",
    "title: \"Accelerate, Three Powerful Sublibraries for PyTorch\" \n",
    "author: \"Zachary Mueller\"\n",
    "format: \n",
    "  revealjs: \n",
    "    theme: moon \n",
    "    fig-format: png\n",
    "categories: [Lesson 6]\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2aba289-d771-4be9-a4ec-99ab268c5586",
   "metadata": {},
   "source": [
    "## What is 🤗 Accelerate?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "329b61de-d7c9-46d2-adff-7a912ba93356",
   "metadata": {},
   "source": [
    "```{mermaid}\n",
    "%%| fig-height: 6\n",
    "graph LR\n",
    "    A{\"🤗 Accelerate#32;\"}\n",
    "    A --> B[\"Launching<br>Interface#32;\"]\n",
    "    A --> C[\"Training Library#32;\"]\n",
    "    A --> D[\"Big Model<br>Inference#32;\"]\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0480b2df-a19c-4b93-b98a-b98da4d0d825",
   "metadata": {},
   "source": [
    "# A Launching Interface\n",
    "\n",
    "Can't I just use `python do_the_thing.py`?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6d5b3da-aad3-4387-b9f3-65384b521bb9",
   "metadata": {},
   "source": [
    "## A Launching Interface\n",
    "\n",
    "Launching scripts in different environments is complicated:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2c2079d-ab7d-4e98-94a9-d16093b81ea6",
   "metadata": {},
   "source": [
    "- ```bash \n",
    "python script.py\n",
    "```\n",
    "\n",
    "- ```bash \n",
    "torchrun --nnodes=1 --nproc_per_node=2 script.py\n",
    "```\n",
    "\n",
    "- ```bash \n",
    "deepspeed --num_gpus=2 script.py\n",
    "```\n",
    "\n",
    "And more!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77bdbbaa-acaa-4ed3-b809-82e836db93f7",
   "metadata": {},
   "source": [
    "## A Launching Interface\n",
    "\n",
    "But it doesn't have to be:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21456afb-7ae6-4bb6-81ea-e6de6365c13f",
   "metadata": {},
   "source": [
    "```bash\n",
    "accelerate launch script.py\n",
    "```\n",
    "\n",
    "A single command to launch with `DeepSpeed`, Fully Sharded Data Parallelism, across single and multi CPUs and GPUs, and to train on TPUs[^1] too! \n",
    "\n",
    "[^1]: Without needing to modify your code and create a `_mp_fn`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d77a2576-5bb4-4a64-bcbf-1be9af3a232b",
   "metadata": {},
   "source": [
    "## A Launching Interface\n",
    "\n",
    "Generate a device-specific configuration through `accelerate config`\n",
    "\n",
    "![](images/CLI.gif)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05c4fe0c-7b86-49a0-bd83-b9dab39e406f",
   "metadata": {},
   "source": [
    "## A Launching Interface\n",
    "\n",
    "Or don't. `accelerate config` doesn't *have* to be done!\n",
    "\n",
    "```bash\n",
    "torchrun --nnodes=1 --nproc_per_node=2 script.py\n",
    "accelerate launch --multi_gpu --nproc_per_node=2 script.py\n",
    "```\n",
    "\n",
    "A quick default configuration can be made too:\n",
    "\n",
    "```bash \n",
    "accelerate config default\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6047a52-3582-41c4-96e4-370ef269be94",
   "metadata": {},
   "source": [
    "## A Launching Interface\n",
    "\n",
    "With the `notebook_launcher` it's also possible to launch code directly from your Jupyter environment too!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1097c474-2ec5-4214-9477-ad6bac25317a",
   "metadata": {},
   "source": [
    "```python\n",
    "from accelerate import notebook_launcher\n",
    "notebook_launcher(\n",
    "    training_loop_function, \n",
    "    args, \n",
    "    num_processes=2\n",
    ")\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8ac070e-bb81-4624-b826-8cb072646ea7",
   "metadata": {},
   "source": [
    "```python\n",
    "Launching training on 2 GPUs.\n",
    "epoch 0: 88.12\n",
    "epoch 1: 91.73\n",
    "epoch 2: 92.58\n",
    "epoch 3: 93.90\n",
    "epoch 4: 94.71\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "debe8ec0-4078-4f85-835e-f38c102ddfaf",
   "metadata": {},
   "source": [
    "# A Training Library\n",
    "\n",
    "Okay, will `accelerate launch` make `do_the_thing.py` use all my GPUs magically?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c8d4a16-7b57-4eee-8974-e39ff459c5e5",
   "metadata": {},
   "source": [
    "## A Training Library\n",
    "\n",
    "- Just showed that its possible using `accelerate launch` to *launch* a python script in various distributed environments\n",
    "- This does *not* mean that the script will just \"use\" that code and still run on the new compute efficiently.\n",
    "- Training on different computes often means *many* lines of code changed for each specific compute.\n",
    "- 🤗 `accelerate` solves this by ensuring the same code can be ran on a CPU or GPU, multiples, and on TPUs!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8f7dfdd-5af5-4f6a-831b-d4e0a8f312d3",
   "metadata": {},
   "source": [
    "## A Training Library\n",
    "\n",
    "\n",
    "```{.python}\n",
    "for batch in dataloader:\n",
    "    optimizer.zero_grad()\n",
    "    inputs, targets = batch\n",
    "    inputs = inputs.to(device)\n",
    "    targets = targets.to(device)\n",
    "    outputs = model(inputs)\n",
    "    loss = loss_function(outputs, targets)\n",
    "    loss.backward()\n",
    "    optimizer.step()\n",
    "    scheduler.step()\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c13ef82-d4c4-4564-8d3d-ef4c4ad3c9d8",
   "metadata": {},
   "source": [
    "## A Training Library {.smaller}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "992909f7-8f5c-4138-8f31-94f305f564de",
   "metadata": {},
   "source": [
    ":::: {.columns}\n",
    "::: {.column width=\"43%\"}\n",
    "<br><br><br>\n",
    "```{.python code-line-numbers=\"5-6,9\"}\n",
    "# For alignment purposes\n",
    "for batch in dataloader:\n",
    "    optimizer.zero_grad()\n",
    "    inputs, targets = batch\n",
    "    inputs = inputs.to(device)\n",
    "    targets = targets.to(device)\n",
    "    outputs = model(inputs)\n",
    "    loss = loss_function(outputs, targets)\n",
    "    loss.backward()\n",
    "    optimizer.step()\n",
    "    scheduler.step()\n",
    "```\n",
    ":::\n",
    "::: {.column width=\"57%\"}\n",
    "```{.python code-line-numbers=\"1-7,12-13,16\"}\n",
    "from accelerate import Accelerator\n",
    "accelerator = Accelerator()\n",
    "dataloader, model, optimizer scheduler = (\n",
    "    accelerator.prepare(\n",
    "        dataloader, model, optimizer, scheduler\n",
    "    )\n",
    ")\n",
    "\n",
    "for batch in dataloader:\n",
    "    optimizer.zero_grad()\n",
    "    inputs, targets = batch\n",
    "    # inputs = inputs.to(device)\n",
    "    # targets = targets.to(device)\n",
    "    outputs = model(inputs)\n",
    "    loss = loss_function(outputs, targets)\n",
    "    accelerator.backward(loss) # loss.backward()\n",
    "    optimizer.step()\n",
    "    scheduler.step()\n",
    "```\n",
    ":::\n",
    "\n",
    "::::"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4028a9a1-5c25-41a3-9c76-f116d6fbb1db",
   "metadata": {},
   "source": [
    "## A Training Library\n",
    "\n",
    "What all happened in `Accelerator.prepare`?\n",
    "\n",
    "::: {.incremental}\n",
    "1. `Accelerator` looked at the configuration\n",
    "2. The `dataloader` was converted into one that can dispatch each batch onto a seperate GPU\n",
    "3. The `model` was wrapped with the appropriate DDP wrapper from either `torch.distributed` or `torch_xla`\n",
    "4. The `optimizer` and `scheduler` were both converted into an `AcceleratedOptimizer` and `AcceleratedScheduler` which knows how to handle any distributed scenario\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "92e112c3-cdf9-4d84-8076-df33a79da641",
   "metadata": {},
   "source": [
    "## Let's bring in `fastai`\n",
    "\n",
    "To utilize the `notebook_launcher` and `accelerate` at once it requires a few steps:\n",
    "\n",
    "1. Migrate the `DataLoaders` creation to inside the `train` function\n",
    "2. Use the `distrib_ctx` context manager fastai provides\n",
    "3. Train!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba04e9b9-4589-4a08-adc3-cb2b4ec6ad43",
   "metadata": {},
   "source": [
    "## Let's bring `fastai`\n",
    "\n",
    "Here it is in code, based on the [distributed app examples](https://docs.fast.ai/examples/distributed_app_examples.html)\n",
    "\n",
    "```{.python}\n",
    "from fastai.vision.all import *\n",
    "from fastai.distributed import *\n",
    "\n",
    "path = untar_data(URLs.PETS)/'images'\n",
    "\n",
    "def train():\n",
    "    dls = ImageDataLoaders.from_name_func(\n",
    "        path, get_image_files(path), valid_pct=0.2,\n",
    "        label_func=lambda x: x[0].isupper(), item_tfms=Resize(224))\n",
    "    learn = vision_learner(dls, resnet34, metrics=error_rate).to_fp16()\n",
    "    with learn.distrib_ctx(in_notebook=True, sync_bn=False):\n",
    "        learn.fine_tune(1)\n",
    "\n",
    "notebook_launcher(train, num_processes=2)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "95c138db-6ef1-4c20-ba76-5040deca83e1",
   "metadata": {},
   "source": [
    "## Let's bring `fastai`\n",
    "\n",
    "Here it is in code, based on the [distributed app examples](https://docs.fast.ai/examples/distributed_app_examples.html)\n",
    "\n",
    "```{.python code-line-numbers=\"1,5,10,13\"}\n",
    "from fastai.vision.all import *\n",
    "from fastai.distributed import *\n",
    "\n",
    "path = untar_data(URLs.PETS)/'images'\n",
    "\n",
    "def train():\n",
    "    dls = ImageDataLoaders.from_name_func(\n",
    "        path, get_image_files(path), valid_pct=0.2,\n",
    "        label_func=lambda x: x[0].isupper(), item_tfms=Resize(224))\n",
    "    learn = vision_learner(dls, resnet34, metrics=error_rate).to_fp16()\n",
    "    with learn.distrib_ctx(in_notebook=True, sync_bn=False):\n",
    "        learn.fine_tune(1)\n",
    "\n",
    "notebook_launcher(train, num_processes=2)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4e4a37d-8044-4b0b-91a9-ca8ec8d54895",
   "metadata": {},
   "source": [
    "## Let's bring `fastai`\n",
    "\n",
    "The key important parts to remember are:\n",
    "\n",
    "- **No** code should *touch* the GPU before calling `notebook_launcher`\n",
    "- Generally it's recommended to let fastai handle gradient accumulation and mixed precision in this case, so use their in-house Callbacks\n",
    "- Use the `notebook_launcher` to run the training function after everything is complete."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10 (default, Nov 14 2022, 12:59:47) \n[GCC 9.4.0]"
  },
  "vscode": {
   "interpreter": {
    "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}