uartimcs commited on
Commit
dacdaca
1 Parent(s): 6d74987

Upload donut_simple.ipynb

Browse files
Files changed (1) hide show
  1. donut_simple.ipynb +12 -10
donut_simple.ipynb CHANGED
@@ -3,11 +3,13 @@
3
  {
4
  "cell_type": "markdown",
5
  "source": [
6
- "1. Download the donut folder from Github https://github.com/clovaai/donut\n",
7
- "2. Copy a config file in folder and change the name to hold your configuration.\n",
8
- "3. Place your dataset (train, validation, test) along with JSONL files on the dataset folder.\n",
9
- "4. Refer to donut_training.ipynb to train your model. Use A-100/V-100 GPU to avoid troublesome settings / slow training time.\n",
10
- "5. Run the trained model using this ipynb file."
 
 
11
  ],
12
  "metadata": {
13
  "id": "L5U1ACZZBxfh"
@@ -47,7 +49,8 @@
47
  "# import necessary modules\n",
48
  "from donut import DonutModel\n",
49
  "from PIL import Image\n",
50
- "import torch"
 
51
  ],
52
  "metadata": {
53
  "id": "gSatjcDn5S89"
@@ -58,11 +61,11 @@
58
  {
59
  "cell_type": "code",
60
  "source": [
61
- "# Test the model with testing data. Just to initiate model.\n",
62
- "!python test.py --task_name Booking --dataset_name_or_path dataset/Booking --pretrained_model_name_or_path ./result/train_Booking/donut-booking-extract"
63
  ],
64
  "metadata": {
65
- "id": "dyOv9Omo8dJU"
66
  },
67
  "execution_count": null,
68
  "outputs": []
@@ -70,7 +73,6 @@
70
  {
71
  "cell_type": "code",
72
  "source": [
73
- "\n",
74
  "model = DonutModel.from_pretrained(\"./result/train_Booking/donut-booking-extract\")\n",
75
  "if torch.cuda.is_available():\n",
76
  " model.half()\n",
 
3
  {
4
  "cell_type": "markdown",
5
  "source": [
6
+ "1. Download the repo from Github https://github.com/clovaai/donut using git command or through direct download.\n",
7
+ "2. (The base model config for document classification / document parsing / document Q&A tasks is stored under /config.\n",
8
+ "3. Copy a copy of any YAML file, rename arbitarily and set your parameters.\n",
9
+ "3. Prepare your dataset (train, validation, test) along with JSONL files on the /dataset folder. You can use program to generate JSONL files from csv files. Be remind of the format. One line per one data. One JSONL file in each folder (train/valdidation/test)\n",
10
+ "4. Refer to donut_training.ipynb to train your model. Use A-100/V-100 GPU to avoid troublesome settings / slow training time. The trained model is stored under /result folder.\n",
11
+ "5. Run the trained model using this ipynb file.\n",
12
+ "6. Don't change the version of transformers and timm. It is a nightmare if you don't understand what you do."
13
  ],
14
  "metadata": {
15
  "id": "L5U1ACZZBxfh"
 
49
  "# import necessary modules\n",
50
  "from donut import DonutModel\n",
51
  "from PIL import Image\n",
52
+ "import torch\n",
53
+ "import argparse"
54
  ],
55
  "metadata": {
56
  "id": "gSatjcDn5S89"
 
61
  {
62
  "cell_type": "code",
63
  "source": [
64
+ "# Input the default arguments\n",
65
+ "parser = argparse.ArgumentParser()"
66
  ],
67
  "metadata": {
68
+ "id": "RZSmy3Riz7ia"
69
  },
70
  "execution_count": null,
71
  "outputs": []
 
73
  {
74
  "cell_type": "code",
75
  "source": [
 
76
  "model = DonutModel.from_pretrained(\"./result/train_Booking/donut-booking-extract\")\n",
77
  "if torch.cuda.is_available():\n",
78
  " model.half()\n",