Spaces:

Hazem
/

Fac256xc

Runtime error

App Files Files Community

boris commited on May 10, 2022

Commit

badd15c

•

1 Parent(s): 8845d77

fix(colab): use correct param name for CLIP

Browse files

Files changed (1) hide show

tools/inference/inference_pipeline.ipynb +466 -849

tools/inference/inference_pipeline.ipynb CHANGED Viewed

@@ -1,865 +1,482 @@
 {
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "118UKH5bWCGa"
-   },
-   "source": [
-    "# DALL·E mini - Inference pipeline\n",
-    "\n",
-    "*Generate images from a text prompt*\n",
-    "\n",
-    "<img src=\"https://github.com/borisdayma/dalle-mini/blob/main/img/logo.png?raw=true\" width=\"200\">\n",
-    "\n",
-    "This notebook illustrates [DALL·E mini](https://github.com/borisdayma/dalle-mini) inference pipeline.\n",
-    "\n",
-    "Just want to play? Use directly [DALL·E mini app](https://huggingface.co/spaces/dalle-mini/dalle-mini).\n",
-    "\n",
-    "For more understanding of the model, refer to [the report](https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini--Vmlldzo4NjIxODA)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "dS8LbaonYm3a"
-   },
-   "source": [
-    "## 🛠️ Installation and set-up"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "id": "uzjAM2GBYpZX"
-   },
-   "outputs": [],
-   "source": [
-    "# Install required libraries\n",
-    "!pip install -q git+https://github.com/huggingface/transformers.git\n",
-    "!pip install -q git+https://github.com/patil-suraj/vqgan-jax.git\n",
-    "!pip install -q git+https://github.com/borisdayma/dalle-mini.git"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "ozHzTkyv8cqU"
-   },
-   "source": [
-    "We load required models:\n",
-    "* DALL·E mini for text to encoded images\n",
-    "* VQGAN for decoding images\n",
-    "* CLIP for scoring predictions"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "id": "K6CxW2o42f-w"
-   },
-   "outputs": [],
-   "source": [
-    "# Model references\n",
-    "\n",
-    "# dalle-mega\n",
-    "DALLE_MODEL = \"dalle-mini/dalle-mini/mega-1-fp16:latest\"  # can be wandb artifact or 🤗 Hub or local folder or google bucket\n",
-    "DALLE_COMMIT_ID = None\n",
-    "\n",
-    "# if the notebook crashes too often you can use dalle-mini instead by uncommenting below line\n",
-    "# DALLE_MODEL = \"dalle-mini/dalle-mini/mini-1:v0\"\n",
-    "\n",
-    "# VQGAN model\n",
-    "VQGAN_REPO = \"dalle-mini/vqgan_imagenet_f16_16384\"\n",
-    "VQGAN_COMMIT_ID = \"e93a26e7707683d349bf5d5c41c5b0ef69b677a9\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "colab": {
-     "base_uri": "https://localhost:8080/"
     },
-    "id": "Yv-aR3t4Oe5v",
-    "outputId": "3097b2c7-5dac-475f-edde-898799dd7294"
-   },
-   "outputs": [],
-   "source": [
-    "import jax\n",
-    "import jax.numpy as jnp\n",
-    "\n",
-    "# check how many devices are available\n",
-    "jax.local_device_count()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "colab": {
-     "base_uri": "https://localhost:8080/"
     },
-    "id": "92zYmvsQ38vL",
-    "outputId": "d897dfdb-dae7-4026-da36-8b23dce066e8"
-   },
-   "outputs": [],
-   "source": [
-    "# Load models & tokenizer\n",
-    "from dalle_mini import DalleBart, DalleBartProcessor\n",
-    "from vqgan_jax.modeling_flax_vqgan import VQModel\n",
-    "from transformers import CLIPProcessor, FlaxCLIPModel\n",
-    "\n",
-    "# Load dalle-mini\n",
-    "model, params = DalleBart.from_pretrained(\n",
-    "    DALLE_MODEL, revision=DALLE_COMMIT_ID, dtype=jnp.float16, _do_init=False\n",
-    ")\n",
-    "\n",
-    "# Load VQGAN\n",
-    "vqgan, vqgan_params = VQModel.from_pretrained(\n",
-    "    VQGAN_REPO, revision=VQGAN_COMMIT_ID, _do_init=False\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "o_vH2X1tDtzA"
-   },
-   "source": [
-    "Model parameters are replicated on each device for faster inference."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "id": "wtvLoM48EeVw"
-   },
-   "outputs": [],
-   "source": [
-    "from flax.jax_utils import replicate\n",
-    "\n",
-    "params = replicate(params)\n",
-    "vqgan_params = replicate(vqgan_params)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "0A9AHQIgZ_qw"
-   },
-   "source": [
-    "Model functions are compiled and parallelized to take advantage of multiple devices."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "id": "sOtoOmYsSYPz"
-   },
-   "outputs": [],
-   "source": [
-    "from functools import partial\n",
-    "\n",
-    "# model inference\n",
-    "@partial(jax.pmap, axis_name=\"batch\", static_broadcasted_argnums=(3, 4, 5, 6))\n",
-    "def p_generate(\n",
-    "    tokenized_prompt, key, params, top_k, top_p, temperature, condition_scale\n",
-    "):\n",
-    "    return model.generate(\n",
-    "        **tokenized_prompt,\n",
-    "        prng_key=key,\n",
-    "        params=params,\n",
-    "        top_k=top_k,\n",
-    "        top_p=top_p,\n",
-    "        temperature=temperature,\n",
-    "        condition_scale=condition_scale,\n",
-    "    )\n",
-    "\n",
-    "\n",
-    "# decode image\n",
-    "@partial(jax.pmap, axis_name=\"batch\")\n",
-    "def p_decode(indices, params):\n",
-    "    return vqgan.decode_code(indices, params=params)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "HmVN6IBwapBA"
-   },
-   "source": [
-    "Keys are passed to the model on each device to generate unique inference per device."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "id": "4CTXmlUkThhX"
-   },
-   "outputs": [],
-   "source": [
-    "import random\n",
-    "\n",
-    "# create a random key\n",
-    "seed = random.randint(0, 2**32 - 1)\n",
-    "key = jax.random.PRNGKey(seed)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "BrnVyCo81pij"
-   },
-   "source": [
-    "## 🖍 Text Prompt"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "rsmj0Aj5OQox"
-   },
-   "source": [
-    "Our model requires processing prompts."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "colab": {
-     "base_uri": "https://localhost:8080/"
     },
-    "id": "YjjhUychOVxm",
-    "outputId": "a286f17a-a388-4754-ec4d-0464c0666c90"
-   },
-   "outputs": [],
-   "source": [
-    "from dalle_mini import DalleBartProcessor\n",
-    "\n",
-    "processor = DalleBartProcessor.from_pretrained(DALLE_MODEL, revision=DALLE_COMMIT_ID)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "BQ7fymSPyvF_"
-   },
-   "source": [
-    "Let's define a text prompt."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "id": "x_0vI9ge1oKr"
-   },
-   "outputs": [],
-   "source": [
-    "prompt = \"sunset over a lake in the mountains\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "id": "VKjEZGjtO49k"
-   },
-   "outputs": [],
-   "source": [
-    "tokenized_prompt = processor([prompt])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "-CEJBnuJOe5z"
-   },
-   "source": [
-    "Finally we replicate it onto each device."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "id": "lQePgju5Oe5z"
-   },
-   "outputs": [],
-   "source": [
-    "tokenized_prompt = replicate(tokenized_prompt)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "phQ9bhjRkgAZ"
-   },
-   "source": [
-    "## 🎨 Generate images\n",
-    "\n",
-    "We generate images using dalle-mini model and decode them with the VQGAN."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "id": "d0wVkXpKqnHA"
-   },
-   "outputs": [],
-   "source": [
-    "# number of predictions\n",
-    "n_predictions = 8\n",
-    "\n",
-    "# We can customize generation parameters\n",
-    "gen_top_k = None\n",
-    "gen_top_p = None\n",
-    "temperature = None\n",
-    "cond_scale = 3.0"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "colab": {
-     "base_uri": "https://localhost:8080/",
-     "height": 1000,
-     "referenced_widgets": [
-      "cef76449b8d74217ae36c56be3990eec",
-      "7be07ba7cfe642a596509c756dcefddc",
-      "2a02378499fc414299f17a2d5dcac867",
-      "427d47d9423441d286ae80a637ae35a0",
-      "cb157fd4e37041d1beae29eaa729c8ff",
-      "73413668398b45dfa8484a2c2be778ec",
-      "e7d108a4b168442fb2048f58ddeb0a18",
-      "5e81a141422f432395055f5cafb07016",
-      "5f476a929da84fa985b2e980459da7b9",
-      "f3b643a0ca2444fd959fff9b45d79d27",
-      "82b87345233549d699ce3fd8080fa988"
-     ]
     },
-    "id": "SDjEx9JxR3v8",
-    "outputId": "8f4287a7-aff9-41ef-a026-02265de0c205"
-   },
-   "outputs": [],
-   "source": [
-    "from flax.training.common_utils import shard_prng_key\n",
-    "import numpy as np\n",
-    "from PIL import Image\n",
-    "from tqdm.notebook import trange\n",
-    "\n",
-    "print(f\"Prompt: {prompt}\\n\")\n",
-    "# generate images\n",
-    "images = []\n",
-    "for i in trange(max(n_predictions // jax.device_count(), 1)):\n",
-    "    # get a new key\n",
-    "    key, subkey = jax.random.split(key)\n",
-    "    # generate images\n",
-    "    encoded_images = p_generate(\n",
-    "        tokenized_prompt,\n",
-    "        shard_prng_key(subkey),\n",
-    "        params,\n",
-    "        gen_top_k,\n",
-    "        gen_top_p,\n",
-    "        temperature,\n",
-    "        cond_scale,\n",
-    "    )\n",
-    "    # remove BOS\n",
-    "    encoded_images = encoded_images.sequences[..., 1:]\n",
-    "    # decode images\n",
-    "    decoded_images = p_decode(encoded_images, vqgan_params)\n",
-    "    decoded_images = decoded_images.clip(0.0, 1.0).reshape((-1, 256, 256, 3))\n",
-    "    for decoded_img in decoded_images:\n",
-    "        img = Image.fromarray(np.asarray(decoded_img * 255, dtype=np.uint8))\n",
-    "        images.append(img)\n",
-    "        display(img)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "tw02wG9zGmyB"
-   },
-   "source": [
-    "## 🏅 Optional: Rank images by CLIP score\n",
-    "\n",
-    "We can rank images according to CLIP.\n",
-    "\n",
-    "**Note: your session may crash if you don't have a subscription to Colab Pro.**"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "id": "RGjlIW_f6GA0"
-   },
-   "outputs": [],
-   "source": [
-    "# CLIP model\n",
-    "CLIP_REPO = \"openai/clip-vit-base-patch32\"\n",
-    "CLIP_COMMIT_ID = None\n",
-    "\n",
-    "# Load CLIP\n",
-    "clip, clip_params = FlaxCLIPModel.from_pretrained(\n",
-    "    CLIP_REPO, revision=CLIP_COMMIT_ID, dtype=jnp.float16, _do_init=False\n",
-    ")\n",
-    "clip_processor = CLIPProcessor.from_pretrained(CLIP_REPO, revision=CLIP_COMMIT_ID)\n",
-    "clip_params = replicate(clip_params)\n",
-    "\n",
-    "# score images\n",
-    "@partial(jax.pmap, axis_name=\"batch\")\n",
-    "def p_clip(inputs, params):\n",
-    "    logits = clip(params=params, **inputs).logits_per_image\n",
-    "    return logits"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "id": "FoLXpjCmGpju"
-   },
-   "outputs": [],
-   "source": [
-    "from flax.training.common_utils import shard\n",
-    "\n",
-    "# CLIP model\n",
-    "CLIP_REPO = \"openai/clip-vit-base-patch32\"\n",
-    "CLIP_COMMIT_ID = None\n",
-    "\n",
-    "# Load CLIP\n",
-    "clip, clip_params = FlaxCLIPModel.from_pretrained(\n",
-    "    CLIP_REPO, revision=CLIP_COMMIT_ID, _do_init=False\n",
-    ")\n",
-    "clip_processor = CLIPProcessor.from_pretrained(CLIP_REPO, revision=CLIP_COMMIT_ID)\n",
-    "clip_params = replicate(clip_params)\n",
-    "\n",
-    "# score images\n",
-    "@partial(jax.pmap, axis_name=\"batch\")\n",
-    "def p_clip(inputs, params):\n",
-    "    logits = clip(params=params, **inputs).logits_per_image\n",
-    "    return logits\n",
-    "\n",
-    "\n",
-    "# get clip scores\n",
-    "clip_inputs = clip_processor(\n",
-    "    text=[prompt] * jax.device_count(),\n",
-    "    images=images,\n",
-    "    return_tensors=\"np\",\n",
-    "    padding=\"max_length\",\n",
-    "    max_length=77,\n",
-    "    truncation=True,\n",
-    ").data\n",
-    "logits = p_clip(shard(clip_inputs), clip.params)\n",
-    "logits = logits.squeeze().flatten()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "4AAWRm70LgED"
-   },
-   "source": [
-    "Let's now display images ranked by CLIP score."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "id": "zsgxxubLLkIu"
-   },
-   "outputs": [],
-   "source": [
-    "print(f\"Prompt: {prompt}\\n\")\n",
-    "for idx in logits.argsort()[::-1]:\n",
-    "    display(images[idx])\n",
-    "    print(f\"Score: {logits[idx]:.2f}\\n\")"
-   ]
-  }
- ],
- "metadata": {
-  "accelerator": "GPU",
-  "colab": {
-   "collapsed_sections": [],
-   "machine_shape": "hm",
-   "name": "DALL·E mini - Inference pipeline.ipynb",
-   "provenance": []
-  },
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.9.7"
-  },
-  "widgets": {
-   "application/vnd.jupyter.widget-state+json": {
-    "2a02378499fc414299f17a2d5dcac867": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "FloatProgressModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "FloatProgressModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "ProgressView",
-      "bar_style": "",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_5e81a141422f432395055f5cafb07016",
-      "max": 8,
-      "min": 0,
-      "orientation": "horizontal",
-      "style": "IPY_MODEL_5f476a929da84fa985b2e980459da7b9",
-      "value": 5
-     }
     },
-    "427d47d9423441d286ae80a637ae35a0": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_f3b643a0ca2444fd959fff9b45d79d27",
-      "placeholder": "",
-      "style": "IPY_MODEL_82b87345233549d699ce3fd8080fa988",
-      "value": " 5/8 [04:25&lt;02:39, 53.09s/it]"
-     }
     },
-    "5e81a141422f432395055f5cafb07016": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
     },
-    "5f476a929da84fa985b2e980459da7b9": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "ProgressStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "ProgressStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "bar_color": null,
-      "description_width": ""
-     }
     },
-    "73413668398b45dfa8484a2c2be778ec": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
     },
-    "7be07ba7cfe642a596509c756dcefddc": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_73413668398b45dfa8484a2c2be778ec",
-      "placeholder": "",
-      "style": "IPY_MODEL_e7d108a4b168442fb2048f58ddeb0a18",
-      "value": " 62%"
-     }
     },
-    "82b87345233549d699ce3fd8080fa988": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
     },
-    "cb157fd4e37041d1beae29eaa729c8ff": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
     },
-    "cef76449b8d74217ae36c56be3990eec": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HBoxModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HBoxModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HBoxView",
-      "box_style": "",
-      "children": [
-       "IPY_MODEL_7be07ba7cfe642a596509c756dcefddc",
-       "IPY_MODEL_2a02378499fc414299f17a2d5dcac867",
-       "IPY_MODEL_427d47d9423441d286ae80a637ae35a0"
-      ],
-      "layout": "IPY_MODEL_cb157fd4e37041d1beae29eaa729c8ff"
-     }
     },
-    "e7d108a4b168442fb2048f58ddeb0a18": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
     },
-    "f3b643a0ca2444fd959fff9b45d79d27": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
     }
-   }
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 0
-}

 {
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/borisdayma/dalle-mini/blob/main/tools/inference/inference_pipeline.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "118UKH5bWCGa"
+      },
+      "source": [
+        "# DALL·E mini - Inference pipeline\n",
+        "\n",
+        "*Generate images from a text prompt*\n",
+        "\n",
+        "<img src=\"https://github.com/borisdayma/dalle-mini/blob/main/img/logo.png?raw=true\" width=\"200\">\n",
+        "\n",
+        "This notebook illustrates [DALL·E mini](https://github.com/borisdayma/dalle-mini) inference pipeline.\n",
+        "\n",
+        "Just want to play? Use directly [DALL·E mini app](https://huggingface.co/spaces/dalle-mini/dalle-mini).\n",
+        "\n",
+        "For more understanding of the model, refer to [the report](https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini--Vmlldzo4NjIxODA)."
+      ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "dS8LbaonYm3a"
+      },
+      "source": [
+        "## 🛠️ Installation and set-up"
+      ]
     },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "uzjAM2GBYpZX"
+      },
+      "outputs": [],
+      "source": [
+        "# Install required libraries\n",
+        "!pip install -q git+https://github.com/huggingface/transformers.git\n",
+        "!pip install -q git+https://github.com/patil-suraj/vqgan-jax.git\n",
+        "!pip install -q git+https://github.com/borisdayma/dalle-mini.git"
+      ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "ozHzTkyv8cqU"
+      },
+      "source": [
+        "We load required models:\n",
+        "* DALL·E mini for text to encoded images\n",
+        "* VQGAN for decoding images\n",
+        "* CLIP for scoring predictions"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "K6CxW2o42f-w"
+      },
+      "outputs": [],
+      "source": [
+        "# Model references\n",
+        "\n",
+        "# dalle-mega\n",
+        "DALLE_MODEL = \"dalle-mini/dalle-mini/mega-1-fp16:latest\"  # can be wandb artifact or 🤗 Hub or local folder or google bucket\n",
+        "DALLE_COMMIT_ID = None\n",
+        "\n",
+        "# if the notebook crashes too often you can use dalle-mini instead by uncommenting below line\n",
+        "# DALLE_MODEL = \"dalle-mini/dalle-mini/mini-1:v0\"\n",
+        "\n",
+        "# VQGAN model\n",
+        "VQGAN_REPO = \"dalle-mini/vqgan_imagenet_f16_16384\"\n",
+        "VQGAN_COMMIT_ID = \"e93a26e7707683d349bf5d5c41c5b0ef69b677a9\""
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "Yv-aR3t4Oe5v"
+      },
+      "outputs": [],
+      "source": [
+        "import jax\n",
+        "import jax.numpy as jnp\n",
+        "\n",
+        "# check how many devices are available\n",
+        "jax.local_device_count()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "92zYmvsQ38vL"
+      },
+      "outputs": [],
+      "source": [
+        "# Load models & tokenizer\n",
+        "from dalle_mini import DalleBart, DalleBartProcessor\n",
+        "from vqgan_jax.modeling_flax_vqgan import VQModel\n",
+        "from transformers import CLIPProcessor, FlaxCLIPModel\n",
+        "\n",
+        "# Load dalle-mini\n",
+        "model, params = DalleBart.from_pretrained(\n",
+        "    DALLE_MODEL, revision=DALLE_COMMIT_ID, dtype=jnp.float16, _do_init=False\n",
+        ")\n",
+        "\n",
+        "# Load VQGAN\n",
+        "vqgan, vqgan_params = VQModel.from_pretrained(\n",
+        "    VQGAN_REPO, revision=VQGAN_COMMIT_ID, _do_init=False\n",
+        ")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "o_vH2X1tDtzA"
+      },
+      "source": [
+        "Model parameters are replicated on each device for faster inference."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "wtvLoM48EeVw"
+      },
+      "outputs": [],
+      "source": [
+        "from flax.jax_utils import replicate\n",
+        "\n",
+        "params = replicate(params)\n",
+        "vqgan_params = replicate(vqgan_params)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "0A9AHQIgZ_qw"
+      },
+      "source": [
+        "Model functions are compiled and parallelized to take advantage of multiple devices."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "sOtoOmYsSYPz"
+      },
+      "outputs": [],
+      "source": [
+        "from functools import partial\n",
+        "\n",
+        "# model inference\n",
+        "@partial(jax.pmap, axis_name=\"batch\", static_broadcasted_argnums=(3, 4, 5, 6))\n",
+        "def p_generate(\n",
+        "    tokenized_prompt, key, params, top_k, top_p, temperature, condition_scale\n",
+        "):\n",
+        "    return model.generate(\n",
+        "        **tokenized_prompt,\n",
+        "        prng_key=key,\n",
+        "        params=params,\n",
+        "        top_k=top_k,\n",
+        "        top_p=top_p,\n",
+        "        temperature=temperature,\n",
+        "        condition_scale=condition_scale,\n",
+        "    )\n",
+        "\n",
+        "\n",
+        "# decode image\n",
+        "@partial(jax.pmap, axis_name=\"batch\")\n",
+        "def p_decode(indices, params):\n",
+        "    return vqgan.decode_code(indices, params=params)"
+      ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "HmVN6IBwapBA"
+      },
+      "source": [
+        "Keys are passed to the model on each device to generate unique inference per device."
+      ]
     },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "4CTXmlUkThhX"
+      },
+      "outputs": [],
+      "source": [
+        "import random\n",
+        "\n",
+        "# create a random key\n",
+        "seed = random.randint(0, 2**32 - 1)\n",
+        "key = jax.random.PRNGKey(seed)"
+      ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BrnVyCo81pij"
+      },
+      "source": [
+        "## 🖍 Text Prompt"
+      ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "rsmj0Aj5OQox"
+      },
+      "source": [
+        "Our model requires processing prompts."
+      ]
     },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "YjjhUychOVxm"
+      },
+      "outputs": [],
+      "source": [
+        "from dalle_mini import DalleBartProcessor\n",
+        "\n",
+        "processor = DalleBartProcessor.from_pretrained(DALLE_MODEL, revision=DALLE_COMMIT_ID)"
+      ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BQ7fymSPyvF_"
+      },
+      "source": [
+        "Let's define a text prompt."
+      ]
     },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "x_0vI9ge1oKr"
+      },
+      "outputs": [],
+      "source": [
+        "prompt = \"sunset over a lake in the mountains\""
+      ]
     },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "VKjEZGjtO49k"
+      },
+      "outputs": [],
+      "source": [
+        "tokenized_prompt = processor([prompt])"
+      ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "-CEJBnuJOe5z"
+      },
+      "source": [
+        "Finally we replicate it onto each device."
+      ]
     },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "lQePgju5Oe5z"
+      },
+      "outputs": [],
+      "source": [
+        "tokenized_prompt = replicate(tokenized_prompt)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "phQ9bhjRkgAZ"
+      },
+      "source": [
+        "## 🎨 Generate images\n",
+        "\n",
+        "We generate images using dalle-mini model and decode them with the VQGAN."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "d0wVkXpKqnHA"
+      },
+      "outputs": [],
+      "source": [
+        "# number of predictions\n",
+        "n_predictions = 8\n",
+        "\n",
+        "# We can customize generation parameters\n",
+        "gen_top_k = None\n",
+        "gen_top_p = None\n",
+        "temperature = None\n",
+        "cond_scale = 3.0"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "SDjEx9JxR3v8"
+      },
+      "outputs": [],
+      "source": [
+        "from flax.training.common_utils import shard_prng_key\n",
+        "import numpy as np\n",
+        "from PIL import Image\n",
+        "from tqdm.notebook import trange\n",
+        "\n",
+        "print(f\"Prompt: {prompt}\\n\")\n",
+        "# generate images\n",
+        "images = []\n",
+        "for i in trange(max(n_predictions // jax.device_count(), 1)):\n",
+        "    # get a new key\n",
+        "    key, subkey = jax.random.split(key)\n",
+        "    # generate images\n",
+        "    encoded_images = p_generate(\n",
+        "        tokenized_prompt,\n",
+        "        shard_prng_key(subkey),\n",
+        "        params,\n",
+        "        gen_top_k,\n",
+        "        gen_top_p,\n",
+        "        temperature,\n",
+        "        cond_scale,\n",
+        "    )\n",
+        "    # remove BOS\n",
+        "    encoded_images = encoded_images.sequences[..., 1:]\n",
+        "    # decode images\n",
+        "    decoded_images = p_decode(encoded_images, vqgan_params)\n",
+        "    decoded_images = decoded_images.clip(0.0, 1.0).reshape((-1, 256, 256, 3))\n",
+        "    for decoded_img in decoded_images:\n",
+        "        img = Image.fromarray(np.asarray(decoded_img * 255, dtype=np.uint8))\n",
+        "        images.append(img)\n",
+        "        display(img)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "tw02wG9zGmyB"
+      },
+      "source": [
+        "## 🏅 Optional: Rank images by CLIP score\n",
+        "\n",
+        "We can rank images according to CLIP.\n",
+        "\n",
+        "**Note: your session may crash if you don't have a subscription to Colab Pro.**"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "RGjlIW_f6GA0"
+      },
+      "outputs": [],
+      "source": [
+        "# CLIP model\n",
+        "CLIP_REPO = \"openai/clip-vit-base-patch32\"\n",
+        "CLIP_COMMIT_ID = None\n",
+        "\n",
+        "# Load CLIP\n",
+        "clip, clip_params = FlaxCLIPModel.from_pretrained(\n",
+        "    CLIP_REPO, revision=CLIP_COMMIT_ID, dtype=jnp.float16, _do_init=False\n",
+        ")\n",
+        "clip_processor = CLIPProcessor.from_pretrained(CLIP_REPO, revision=CLIP_COMMIT_ID)\n",
+        "clip_params = replicate(clip_params)\n",
+        "\n",
+        "# score images\n",
+        "@partial(jax.pmap, axis_name=\"batch\")\n",
+        "def p_clip(inputs, params):\n",
+        "    logits = clip(params=params, **inputs).logits_per_image\n",
+        "    return logits"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "FoLXpjCmGpju"
+      },
+      "outputs": [],
+      "source": [
+        "from flax.training.common_utils import shard\n",
+        "\n",
+        "# get clip scores\n",
+        "clip_inputs = clip_processor(\n",
+        "    text=[prompt] * jax.device_count(),\n",
+        "    images=images,\n",
+        "    return_tensors=\"np\",\n",
+        "    padding=\"max_length\",\n",
+        "    max_length=77,\n",
+        "    truncation=True,\n",
+        ").data\n",
+        "logits = p_clip(shard(clip_inputs), clip_params)\n",
+        "logits = logits.squeeze().flatten()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "4AAWRm70LgED"
+      },
+      "source": [
+        "Let's now display images ranked by CLIP score."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "zsgxxubLLkIu"
+      },
+      "outputs": [],
+      "source": [
+        "print(f\"Prompt: {prompt}\\n\")\n",
+        "for idx in logits.argsort()[::-1]:\n",
+        "    display(images[idx])\n",
+        "    print(f\"Score: {logits[idx]:.2f}\\n\")"
+      ]
+    }
+  ],
+  "metadata": {
+    "accelerator": "GPU",
+    "colab": {
+      "collapsed_sections": [],
+      "machine_shape": "hm",
+      "name": "DALL·E mini - Inference pipeline.ipynb",
+      "provenance": [],
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "display_name": "Python 3 (ipykernel)",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.9.7"
     }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}