pierreguillou commited on
Commit
62697e4
1 Parent(s): b509b08

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +48 -24
app.py CHANGED
@@ -133,60 +133,84 @@ def app_outputs(uploaded_pdf):
133
  return msg, img_files[0], img_files[1], images[0], images[1], csv_files[0], csv_files[1], df[0], df[1]
134
 
135
  # gradio APP
136
- with gr.Blocks(title="Inference APP for Document Understanding at line level (v2 - LayoutXLM base)", css=".gradio-container") as demo:
137
  gr.HTML("""
138
- <div style="font-family:'Times New Roman', 'Serif'; font-size:26pt; font-weight:bold; text-align:center;"><h1>Inference APP for Document Understanding at line level (v2 - LayoutXLM base)</h1></div>
139
- <div style="margin-top: 40px"><p>(03/05/2023) This Inference APP uses the <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://huggingface.co/pierreguillou/layout-xlm-base-finetuned-with-DocLayNet-base-at-linelevel-ml384" target="_blank">model Layout XLM base combined with XLM-RoBERTa base and finetuned on the dataset DocLayNet base at line level</a> (chunk size of 384 tokens).</p></div>
140
- <div><p><a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://arxiv.org/abs/2104.08836" target="_blank">LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding</a> is a Document Understanding model that uses both layout and text in order to detect labels of bounding boxes. Combined with the model <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://huggingface.co/xlm-roberta-base" target="_blank">XML-RoBERTa base</a>, this finetuned model has the capacity to <b>understand any language</b>. Finetuned on the dataset <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://huggingface.co/datasets/pierreguillou/DocLayNet-base" target="_blank">DocLayNet base</a>, it can <b>classifly any bounding box (and its OCR text) to 11 labels</b> (Caption, Footnote, Formula, List-item, Page-footer, Page-header, Picture, Section-header, Table, Text, Title).</p></div>
141
- <div><p>It relies on an external OCR engine to get words and bounding boxes from the document image. Thus, let's run in this APP an OCR engine (<a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://github.com/madmaze/pytesseract#python-tesseract" target="_blank">PyTesseract</a>) to get the bounding boxes, then run Layout XLM base (already fine-tuned on the dataset DocLayNet base at line level) on the individual tokens and then, visualize the result at line level!</p></div>
142
- <div><p><b>It allows to get all pages of any PDF (of any language) with bounding boxes labeled at line level and the associated dataframes with labeled data (bounding boxes, texts, labels) :-)</b></p></div>
143
- <div><p>However, the inference time per page can be high when running the model on CPU due to the number of line predictions to be made. Therefore, to avoid running this APP for too long, <b>only the first 2 pages are processed by this APP</b>. If you want to increase this limit, you can either clone this APP in Hugging Face Space (or run its <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://github.com/piegu/language-models/blob/master/Gradio_inference_on_LayoutXLM_base_model_finetuned_on_DocLayNet_base_in_any_language_at_levellines_ml384.ipynb" target="_blank">notebook</a> on your own plateform) and change the value of the parameter <code>max_imgboxes</code>, or run the inference notebook "<a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://github.com/piegu/language-models/blob/master/inference_on_LayoutXLM_base_model_finetuned_on_DocLayNet_base_in_any_language_at_levellines_ml384.ipynb" target="_blank">Document AI | Inference at line level with a Document Understanding model (LayoutXLM base fine-tuned on DocLayNet dataset)</a>" on your own platform as it does not have this limit.</p></div>
144
- <div style="margin-top: 20px"><p>More information about the DocLayNet datasets, the finetuning of the model and this APP in the following blog posts:</p>
145
- <ul><li>(03/05/2023) <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="" target="_blank">Document AI | Inference APP and fine-tuning notebook for Document Understanding at line level with LayoutXLM base</a></li><li>(02/14/2023) <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://medium.com/@pierre_guillou/document-ai-inference-app-for-document-understanding-at-line-level-a35bbfa98893" target="_blank">Document AI | Inference APP for Document Understanding at line level</a></li><li>(02/10/2023) <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://medium.com/@pierre_guillou/document-ai-document-understanding-model-at-line-level-with-lilt-tesseract-and-doclaynet-dataset-347107a643b8" target="_blank">Document AI | Document Understanding model at line level with LiLT, Tesseract and DocLayNet dataset</a></li><li>(01/31/2023) <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://medium.com/@pierre_guillou/document-ai-doclaynet-image-viewer-app-3ac54c19956" target="_blank">Document AI | DocLayNet image viewer APP</a></li><li>(01/27/2023) <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://medium.com/@pierre_guillou/document-ai-processing-of-doclaynet-dataset-to-be-used-by-layout-models-of-the-hugging-face-hub-308d8bd81cdb" target="_blank">Document AI | Processing of DocLayNet dataset to be used by layout models of the Hugging Face hub (finetuning, inference)</a></li></ul></div>
146
  """)
147
  with gr.Row():
148
  pdf_file = gr.File(label="PDF")
149
  with gr.Row():
150
- submit_btn = gr.Button(f"Display first {max_imgboxes} labeled PDF pages")
151
- reset_btn = gr.Button(value="Clear")
152
  with gr.Row():
153
- output_msg = gr.Textbox(label="Output message")
 
 
 
 
 
 
154
  with gr.Row():
155
  fileboxes = []
156
- for num_page in range(max_imgboxes):
157
- file_path = gr.File(visible=True, label=f"Image file of the PDF page n°{num_page}")
 
 
 
158
  fileboxes.append(file_path)
159
  with gr.Row():
160
  imgboxes = []
161
- for num_page in range(max_imgboxes):
162
- img = gr.Image(type="pil", label=f"Image of the PDF page n°{num_page}")
 
 
 
163
  imgboxes.append(img)
164
  with gr.Row():
165
  csvboxes = []
166
- for num_page in range(max_imgboxes):
167
- csv = gr.File(visible=True, label=f"CSV file at line level (page {num_page})")
 
 
 
168
  csvboxes.append(csv)
169
  with gr.Row():
170
  dfboxes = []
171
- for num_page in range(max_imgboxes):
172
  df = gr.Dataframe(
173
  headers=["bounding boxes", "texts", "labels"],
174
  datatype=["str", "str", "str"],
175
  col_count=(3, "fixed"),
176
  visible=True,
177
- label=f"Data of page {num_page}",
 
 
 
 
 
 
 
 
 
 
 
178
  type="pandas",
179
  wrap=True
180
  )
181
  dfboxes.append(df)
182
 
183
- outputboxes = [output_msg] + fileboxes + imgboxes + csvboxes + dfboxes
 
184
  submit_btn.click(app_outputs, inputs=[pdf_file], outputs=outputboxes)
 
 
185
  reset_btn.click(
186
- lambda: [pdf_file.update(value=None), output_msg.update(value=None)] + [filebox.update(value=None) for filebox in fileboxes] + [imgbox.update(value=None) for imgbox in imgboxes] + [csvbox.update(value=None) for csvbox in csvboxes] + [dfbox.update(value=None) for dfbox in dfboxes],
187
  inputs=[],
188
- outputs=[pdf_file, output_msg] + fileboxes + imgboxes + csvboxes + dfboxes,
189
- )
190
 
191
  gr.Examples(
192
  [["files/example.pdf"]],
 
133
  return msg, img_files[0], img_files[1], images[0], images[1], csv_files[0], csv_files[1], df[0], df[1]
134
 
135
  # gradio APP
136
+ with gr.Blocks(title="Inference APP for Document Understanding at line level (v1 - LiLT base vs LayoutXLM base)", css=".gradio-container") as demo:
137
  gr.HTML("""
138
+ <div style="font-family:'Times New Roman', 'Serif'; font-size:26pt; font-weight:bold; text-align:center;"><h1>Inference APP for Document Understanding at line level (v1 - LiLT base vs LayoutXLM base)</h1></div>
139
+ <div style="margin-top: 40px"><p>(03/08/2023) This Inference APP compare - only on the first PDF page - 2 Document Understanding models finetuned on the dataset <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://huggingface.co/datasets/pierreguillou/DocLayNet-base" target="_blank">DocLayNet base</a> at line level (chunk size of 384 tokens): <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://huggingface.co/pierreguillou/lilt-xlm-roberta-base-finetuned-with-DocLayNet-base-at-linelevel-ml384" target="_blank">LiLT base combined with XLM-RoBERTa base</a> and <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://huggingface.co/pierreguillou/layout-xlm-base-finetuned-with-DocLayNet-base-at-linelevel-ml384" target="_blank">LayoutXLM base combined with XLM-RoBERTa base</a>.</p></div>
140
+ <div><p>To test these 2 models separately, use their corresponding APP on Hugging Face Spaces: <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://huggingface.co/spaces/pierreguillou/Inference-APP-Document-Understanding-at-linelevel-v1" target="_blank">LiLT base APP (v1)</a> and <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://huggingface.co/spaces/pierreguillou/Inference-APP-Document-Understanding-at-linelevel-v2" target="_blank">LayoutXLM base APP (v2)</a>.</p></div><div style="margin-top: 20px"><p>Links to Document Understanding APPs:</p><ul><li>Line level: <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://huggingface.co/spaces/pierreguillou/Inference-APP-Document-Understanding-at-linelevel-v1" target="_blank">v1 (LiLT base)</a> | <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://huggingface.co/spaces/pierreguillou/Inference-APP-Document-Understanding-at-linelevel-v2" target="_blank">v2 (LayoutXLM base)</a> | <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://huggingface.co/spaces/pierreguillou/Inference-APP-Document-Understanding-at-linelevel-LiLT-base-LayoutXLM-base-v1" target="_blank">v1 (LilT base vs LayoutXLM base)</a></li><li>Paragraph level: <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://huggingface.co/spaces/pierreguillou/Inference-APP-Document-Understanding-at-paragraphlevel-v1" target="_blank">v1 (LiLT base)</a></li></ul></div><div style="margin-top: 20px"><p>More information about the DocLayNet datasets, the finetuning of the model and this APP in the following blog posts:</p><ul><li>(03/05/2023) <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="" target="_blank">Document AI | Inference APP and fine-tuning notebook for Document Understanding at line level with LayoutXLM base</a></li><li>(02/14/2023) <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://medium.com/@pierre_guillou/document-ai-inference-app-for-document-understanding-at-line-level-a35bbfa98893" target="_blank">Document AI | Inference APP for Document Understanding at line level</a></li><li>(02/10/2023) <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://medium.com/@pierre_guillou/document-ai-document-understanding-model-at-line-level-with-lilt-tesseract-and-doclaynet-dataset-347107a643b8" target="_blank">Document AI | Document Understanding model at line level with LiLT, Tesseract and DocLayNet dataset</a></li><li>(01/31/2023) <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://medium.com/@pierre_guillou/document-ai-doclaynet-image-viewer-app-3ac54c19956" target="_blank">Document AI | DocLayNet image viewer APP</a></li><li>(01/27/2023) <a style="text-decoration: none; border-bottom: #64b5f6 0.125em solid; color: #64b5f6" href="https://medium.com/@pierre_guillou/document-ai-processing-of-doclaynet-dataset-to-be-used-by-layout-models-of-the-hugging-face-hub-308d8bd81cdb" target="_blank">Document AI | Processing of DocLayNet dataset to be used by layout models of the Hugging Face hub (finetuning, inference)</a></li></ul></div>
 
 
 
 
 
141
  """)
142
  with gr.Row():
143
  pdf_file = gr.File(label="PDF")
144
  with gr.Row():
145
+ submit_btn = gr.Button(f"Get layout detection by LiLT and LayoutXLM on the first PDF page")
146
+ reset_btn = gr.Button(value="Clear")
147
  with gr.Row():
148
+ output_messages = []
149
+ with gr.Column():
150
+ output_msg = gr.Textbox(label="LiLT output message")
151
+ output_messages.append(output_msg)
152
+ with gr.Column():
153
+ output_msg = gr.Textbox(label="LayoutXLM output message")
154
+ output_messages.append(output_msg)
155
  with gr.Row():
156
  fileboxes = []
157
+ with gr.Column():
158
+ file_path = gr.File(visible=True, label=f"LiLT image file")
159
+ fileboxes.append(file_path)
160
+ with gr.Column():
161
+ file_path = gr.File(visible=True, label=f"LayoutXLM image file")
162
  fileboxes.append(file_path)
163
  with gr.Row():
164
  imgboxes = []
165
+ with gr.Column():
166
+ img = gr.Image(type="pil", label=f"Lilt Image")
167
+ imgboxes.append(img)
168
+ with gr.Column():
169
+ img = gr.Image(type="pil", label=f"LayoutXLM Image")
170
  imgboxes.append(img)
171
  with gr.Row():
172
  csvboxes = []
173
+ with gr.Column():
174
+ csv = gr.File(visible=True, label=f"LiLT csv file at line level")
175
+ csvboxes.append(csv)
176
+ with gr.Column():
177
+ csv = gr.File(visible=True, label=f"LayoutXLM csv file at line level")
178
  csvboxes.append(csv)
179
  with gr.Row():
180
  dfboxes = []
181
+ with gr.Column():
182
  df = gr.Dataframe(
183
  headers=["bounding boxes", "texts", "labels"],
184
  datatype=["str", "str", "str"],
185
  col_count=(3, "fixed"),
186
  visible=True,
187
+ label=f"LiLT data",
188
+ type="pandas",
189
+ wrap=True
190
+ )
191
+ dfboxes.append(df)
192
+ with gr.Column():
193
+ df = gr.Dataframe(
194
+ headers=["bounding boxes", "texts", "labels"],
195
+ datatype=["str", "str", "str"],
196
+ col_count=(3, "fixed"),
197
+ visible=True,
198
+ label=f"LayoutXLM data",
199
  type="pandas",
200
  wrap=True
201
  )
202
  dfboxes.append(df)
203
 
204
+ outputboxes = output_messages + fileboxes + imgboxes + csvboxes + dfboxes
205
+
206
  submit_btn.click(app_outputs, inputs=[pdf_file], outputs=outputboxes)
207
+
208
+ # https://github.com/gradio-app/gradio/pull/2044/files#diff-a91dd2749f68bb7d0099a0f4079a4fd2d10281e299e7b451cb1bb876a7c21975R91
209
  reset_btn.click(
210
+ lambda: [pdf_file.update(value=None)] + [output_msg.update(value=None) for output_msg in output_messages] + [filebox.update(value=None) for filebox in fileboxes] + [imgbox.update(value=None) for imgbox in imgboxes] + [csvbox.update(value=None) for csvbox in csvboxes] + [dfbox.update(value=None) for dfbox in dfboxes],
211
  inputs=[],
212
+ outputs=[pdf_file] + output_messages + fileboxes + imgboxes + csvboxes + dfboxes
213
+ )
214
 
215
  gr.Examples(
216
  [["files/example.pdf"]],