Upload donut_simple.ipynb
Browse files- donut_simple.ipynb +12 -10
donut_simple.ipynb
CHANGED
@@ -3,11 +3,13 @@
|
|
3 |
{
|
4 |
"cell_type": "markdown",
|
5 |
"source": [
|
6 |
-
"1. Download the
|
7 |
-
"2.
|
8 |
-
"3.
|
9 |
-
"
|
10 |
-
"
|
|
|
|
|
11 |
],
|
12 |
"metadata": {
|
13 |
"id": "L5U1ACZZBxfh"
|
@@ -47,7 +49,8 @@
|
|
47 |
"# import necessary modules\n",
|
48 |
"from donut import DonutModel\n",
|
49 |
"from PIL import Image\n",
|
50 |
-
"import torch"
|
|
|
51 |
],
|
52 |
"metadata": {
|
53 |
"id": "gSatjcDn5S89"
|
@@ -58,11 +61,11 @@
|
|
58 |
{
|
59 |
"cell_type": "code",
|
60 |
"source": [
|
61 |
-
"#
|
62 |
-
"
|
63 |
],
|
64 |
"metadata": {
|
65 |
-
"id": "
|
66 |
},
|
67 |
"execution_count": null,
|
68 |
"outputs": []
|
@@ -70,7 +73,6 @@
|
|
70 |
{
|
71 |
"cell_type": "code",
|
72 |
"source": [
|
73 |
-
"\n",
|
74 |
"model = DonutModel.from_pretrained(\"./result/train_Booking/donut-booking-extract\")\n",
|
75 |
"if torch.cuda.is_available():\n",
|
76 |
" model.half()\n",
|
|
|
3 |
{
|
4 |
"cell_type": "markdown",
|
5 |
"source": [
|
6 |
+
"1. Download the repo from Github https://github.com/clovaai/donut using git command or through direct download.\n",
|
7 |
+
"2. (The base model config for document classification / document parsing / document Q&A tasks is stored under /config.\n",
|
8 |
+
"3. Copy a copy of any YAML file, rename arbitarily and set your parameters.\n",
|
9 |
+
"3. Prepare your dataset (train, validation, test) along with JSONL files on the /dataset folder. You can use program to generate JSONL files from csv files. Be remind of the format. One line per one data. One JSONL file in each folder (train/valdidation/test)\n",
|
10 |
+
"4. Refer to donut_training.ipynb to train your model. Use A-100/V-100 GPU to avoid troublesome settings / slow training time. The trained model is stored under /result folder.\n",
|
11 |
+
"5. Run the trained model using this ipynb file.\n",
|
12 |
+
"6. Don't change the version of transformers and timm. It is a nightmare if you don't understand what you do."
|
13 |
],
|
14 |
"metadata": {
|
15 |
"id": "L5U1ACZZBxfh"
|
|
|
49 |
"# import necessary modules\n",
|
50 |
"from donut import DonutModel\n",
|
51 |
"from PIL import Image\n",
|
52 |
+
"import torch\n",
|
53 |
+
"import argparse"
|
54 |
],
|
55 |
"metadata": {
|
56 |
"id": "gSatjcDn5S89"
|
|
|
61 |
{
|
62 |
"cell_type": "code",
|
63 |
"source": [
|
64 |
+
"# Input the default arguments\n",
|
65 |
+
"parser = argparse.ArgumentParser()"
|
66 |
],
|
67 |
"metadata": {
|
68 |
+
"id": "RZSmy3Riz7ia"
|
69 |
},
|
70 |
"execution_count": null,
|
71 |
"outputs": []
|
|
|
73 |
{
|
74 |
"cell_type": "code",
|
75 |
"source": [
|
|
|
76 |
"model = DonutModel.from_pretrained(\"./result/train_Booking/donut-booking-extract\")\n",
|
77 |
"if torch.cuda.is_available():\n",
|
78 |
" model.half()\n",
|