{
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "1. Download the repo from Github https://github.com/clovaai/donut using git command or through direct download.\n",
        "2. (The base model config for document classification / document parsing / document Q&A tasks is stored under /config.\n",
        "3. Copy a copy of any YAML file, rename arbitarily and set your parameters.\n",
        "3. Prepare your dataset (train, validation, test) along with JSONL files on the /dataset folder. You can use program to generate JSONL files from csv files. Be remind of the format. One line per one data. One JSONL file in each folder (train/valdidation/test)\n",
        "4. Refer to donut_training.ipynb to train your model. Use A-100/V-100 GPU to avoid troublesome settings / slow training time. The trained model is stored under /result folder.\n",
        "5. Run the trained model using this ipynb file.\n",
        "6. Don't change the version of transformers and timm. It is a nightmare if you don't understand what you do."
      ],
      "metadata": {
        "id": "L5U1ACZZBxfh"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Enable Google Drive and Go to the donut folder\n",
        "from google.colab import drive\n",
        "drive.mount('/content/drive')\n",
        "%cd /content/drive/MyDrive/donut"
      ],
      "metadata": {
        "id": "-BZ2HFB9OtWP"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "SJpD4AAj7qeZ"
      },
      "outputs": [],
      "source": [
        "#Install all necessary modules. Don't change the version number!\n",
        "!pip install transformers==4.25.1\n",
        "!pip install timm==0.5.4\n",
        "!pip install donut-python"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# import necessary modules\n",
        "from donut import DonutModel\n",
        "from PIL import Image\n",
        "import torch\n",
        "import argparse"
      ],
      "metadata": {
        "id": "gSatjcDn5S89"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Input the default arguments\n",
        "parser = argparse.ArgumentParser()"
      ],
      "metadata": {
        "id": "RZSmy3Riz7ia"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "model = DonutModel.from_pretrained(\"./result/train_Booking/donut-booking-extract\")\n",
        "if torch.cuda.is_available():\n",
        "    model.half()\n",
        "    device = torch.device(\"cuda\")\n",
        "    model.to(device)\n",
        "else:\n",
        "    model.encoder.to(torch.bfloat16)\n",
        "\n",
        "model.eval()\n",
        "\n",
        "image = Image.open(\"/content/drive/MyDrive/donut/test/4.jpg\").convert(\"RGB\")\n",
        "\n",
        "with torch.no_grad():\n",
        "  output = model.inference(image=image, prompt=\"<s_Booking>\")\n",
        "output"
      ],
      "metadata": {
        "id": "dFfm72T93Z8G"
      },
      "execution_count": null,
      "outputs": []
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "gpuType": "V100",
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}