{ "cells": [ { "cell_type": "markdown", "source": [ "1. Download the repo from Github https://github.com/clovaai/donut using git command or through direct download.\n", "2. (The base model config for document classification / document parsing / document Q&A tasks is stored under /config.\n", "3. Copy a copy of any YAML file, rename arbitarily and set your parameters.\n", "3. Prepare your dataset (train, validation, test) along with JSONL files on the /dataset folder. You can use program to generate JSONL files from csv files. Be remind of the format. One line per one data. One JSONL file in each folder (train/valdidation/test)\n", "4. Refer to donut_training.ipynb to train your model. Use A-100/V-100 GPU to avoid troublesome settings / slow training time. The trained model is stored under /result folder.\n", "5. Run the trained model using this ipynb file.\n", "6. Don't change the version of transformers and timm. It is a nightmare if you don't understand what you do." ], "metadata": { "id": "L5U1ACZZBxfh" } }, { "cell_type": "code", "source": [ "# Enable Google Drive and Go to the donut folder\n", "from google.colab import drive\n", "drive.mount('/content/drive')\n", "%cd /content/drive/MyDrive/donut" ], "metadata": { "id": "-BZ2HFB9OtWP" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "SJpD4AAj7qeZ" }, "outputs": [], "source": [ "#Install all necessary modules. Don't change the version number!\n", "!pip install transformers==4.25.1\n", "!pip install timm==0.5.4\n", "!pip install donut-python" ] }, { "cell_type": "code", "source": [ "# import necessary modules\n", "from donut import DonutModel\n", "from PIL import Image\n", "import torch\n", "import argparse" ], "metadata": { "id": "gSatjcDn5S89" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# Input the default arguments\n", "parser = argparse.ArgumentParser()" ], "metadata": { "id": "RZSmy3Riz7ia" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "model = DonutModel.from_pretrained(\"./result/train_Booking/donut-booking-extract\")\n", "if torch.cuda.is_available():\n", " model.half()\n", " device = torch.device(\"cuda\")\n", " model.to(device)\n", "else:\n", " model.encoder.to(torch.bfloat16)\n", "\n", "model.eval()\n", "\n", "image = Image.open(\"/content/drive/MyDrive/donut/test/4.jpg\").convert(\"RGB\")\n", "\n", "with torch.no_grad():\n", " output = model.inference(image=image, prompt=\"\")\n", "output" ], "metadata": { "id": "dFfm72T93Z8G" }, "execution_count": null, "outputs": [] } ], "metadata": { "accelerator": "GPU", "colab": { "gpuType": "V100", "provenance": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 0 }