Spaces:

Nandhu
/

DocAI

Runtime error

App Files Files Community

enandhag commited on Dec 21, 2022

Commit

4f8f6ef

•

1 Parent(s): f912b4b

pushed gradio app

Browse files

Files changed (15) hide show

Dockerfile +74 -0
README.md +11 -12
app.py +65 -0
data/1012.jpg +0 -0
data/103.jpg +0 -0
data/1031.jpg +0 -0
data/1038.jpg +0 -0
data/1046.jpg +0 -0
docker_build.sh +5 -0
poetry.lock +0 -0
pyproject.toml +20 -0
scripts/__pycache__/predict.cpython-38.pyc +0 -0
scripts/predict.py +60 -0
utils/__pycache__/donut_utils.cpython-38.pyc +0 -0
utils/donut_utils.py +31 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,74 @@

+# ==================================================================
+# Base image
+# ------------------------------------------------------------------
+FROM nvidia/cuda:11.2.0-cudnn8-devel-ubuntu18.04
+# ==================================================================
+# git, text editors, cmake
+# ------------------------------------------------------------------
+RUN apt-get update -y && \
+    apt-get upgrade -y && \
+    APT_INSTALL="apt-get install -y" && \
+    APT_INSTALL_NIR="apt-get install -y --no-install-recommends" && \
+    PIP_INSTALL="python -m pip --no-cache-dir install" && \
+    GIT_CLONE="git clone" && \
+    DEBIAN_FRONTEND=noninteractive $APT_INSTALL_NIR \
+    apt && \
+    DEBIAN_FRONTEND=noninteractive $APT_INSTALL \
+    git-core \
+    ca-certificates \
+    cmake \
+    wget \
+    vim \
+    nano \
+    unzip \
+    ffmpeg \
+    libsm6 libxext6 libxrender-dev \
+    libgstreamer1.0-0 gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav gstreamer1.0-tools \
+    build-essential && \
+    # ==================================================================
+    # python, pip
+    # ------------------------------------------------------------------
+    #    rm -rf /var/lib/apt/lists/* \
+    #           /etc/apt/sources.list.d/cuda.list \
+    #           /etc/apt/sources.list.d/nvidia-ml.list && \
+    apt-get update -y && \
+    apt-get upgrade -y && \
+    DEBIAN_FRONTEND=noninteractive $APT_INSTALL \
+    software-properties-common && \
+    apt-get update && \
+    DEBIAN_FRONTEND=noninteractive $APT_INSTALL \
+    python3.7 \
+    python3.7-dev \
+    python-tk \
+    python3-tk \
+    python3.7-tk \
+    python3-pip && \
+    ln -s /usr/bin/python3.7 /usr/local/bin/python3 && \
+    ln -s /usr/bin/python3.7 /usr/local/bin/python && \
+    python3.7 -m pip install pip --upgrade
+# ==================================================================
+# Tools and dependencies
+# ------------------------------------------------------------------
+RUN python -m pip install \
+    setuptools==41.0.0 \
+    transformers[sentencepiece] \
+    numpy\
+    h5py \
+    scipy \
+    pandas \
+    matplotlib \
+    datasets \
+    pillow \
+    jupyter \
+    scikit-learn \
+    tqdm \
+    torch \
+    torchvision \
+    pytesseract \
+    pdf2img \
+    img2pdf \
+    jupyterlab \
+    timm

README.md CHANGED Viewed

@@ -1,12 +1,11 @@
----
-title: DocAI
-emoji: 🐢
-colorFrom: green
-colorTo: gray
-sdk: gradio
-sdk_version: 3.14.0
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Chequeeasy
+ChequeEasy is a project that aims to simplify the process of approval of cheques and making it easier for both bank officials and customers.
+This project leverages Donut model proposed in the paper <a href="https://arxiv.org/abs/2111.15664/"> OCR-free Document Understanding Transformer </a> for the parsing of the required data from cheques.'
+'Donut is based on a very simple transformer encoder and decoder architecture. It\'s main USP is that it is an OCR-free approach to Visual Document Understanding (VDU) and can perform tasks like document classification, information extraction as well as VQA. \
+OCR based techniques come with several limitations such as requiring use of additional downstream models, lack of understanding about document structure, requiring use of hand crafted rules for information extraction,etc. \
+Donut helps you get rid of all of these OCR specific limitations. The model for the project has been trained using a subset of this  <a href="https://www.kaggle.com/datasets/medali1992/cheque-images/"> kaggle dataset </a>. The original dataset contains images of cheques of 10 different banks.

app.py ADDED Viewed

	@@ -0,0 +1,65 @@

+import os
+import glob
+import gradio as gr
+from scripts.predict import parse_cheque_with_donut
+##Create list of examples to be loaded
+example_list = glob.glob("data/*")
+example_list = list(map(lambda el: [el], example_list))
+demo = gr.Blocks()
+with demo:
+    gr.Markdown("# **<p align='center'>ChequeEasy: Banking made easy </p>**")
+    gr.Markdown(
+        'ChequeEasy is a project that aims to simplify the process of approval of cheques and making it easier for both bank officials and customers. \
+    This project leverages Donut model proposed in the paper <a href="https://arxiv.org/abs/2111.15664/"> OCR-free Document Understanding Transformer </a> for the parsing of the required data from cheques.'
+        'Donut is based on a very simple transformer encoder and decoder architecture. It\'s main USP is that it is an OCR-free approach to Visual Document Understanding (VDU) and can perform tasks like document classification, information extraction as well as VQA. \
+    OCR based techniques come with several limitations such as requiring use of additional downstream models, lack of understanding about document structure, requiring use of hand crafted rules for information extraction,etc. \
+    Donut helps you get rid of all of these OCR specific limitations. The model for the project has been trained using a subset of this  <a href="https://www.kaggle.com/datasets/medali1992/cheque-images/"> kaggle dataset </a>. The original dataset contains images of cheques of 10 different banks.'
+    )
+    with gr.Tabs():
+        with gr.TabItem("Cheque Parser"):
+            gr.Markdown(
+                "This module is used to extract details filled by a bank customer from cheques. At present the model is trained to extract details like - Payee Name, Amount in words, Amount in Figures, Bank Name.  \
+            This model can be further trained to parse additional details like MICR Code, Cheque Number, Account Number, etc."
+            )
+            with gr.Box():
+                gr.Markdown("**Upload Cheque**")
+                input_image_parse = gr.Image(type="filepath", label="Input Cheque")
+            with gr.Box():
+                gr.Markdown("**Parsed Cheque Data**")
+                payee_name = gr.Textbox(label="Payee Name")
+                amt_in_words = gr.Textbox(label="Legal Amount")
+                amt_in_figures = gr.Textbox(label="Courtesy Amount")
+                bank_name = gr.Textbox(label="Bank Name")
+            with gr.Box():
+                gr.Markdown("**Predict**")
+                with gr.Row():
+                    parse_cheque = gr.Button("Call Donut 🍩")
+            with gr.Column():
+                gr.Examples(
+                    example_list,
+                    [input_image_parse],
+                    [payee_name, amt_in_words, amt_in_figures, bank_name],
+                    parse_cheque_with_donut,
+                    cache_examples=False,
+                )
+    parse_cheque.click(
+        parse_cheque_with_donut,
+        inputs=input_image_parse,
+        outputs=[payee_name, amt_in_words, amt_in_figures, bank_name],
+    )
+    gr.Markdown(
+        '\n Solution built by: <a href="https://github.com/Nandhagopalan">Nandhagopalan Elangovan</a>'
+    )
+demo.launch()

data/1012.jpg ADDED Viewed

data/103.jpg ADDED Viewed

data/1031.jpg ADDED Viewed

data/1038.jpg ADDED Viewed

data/1046.jpg ADDED Viewed

docker_build.sh ADDED Viewed

	@@ -0,0 +1,5 @@

+docker build -t harbor.hpc.ford.com/enandhag/docai:v1 \
+             --build-arg https_proxy=http://internet.ford.com:83/ \
+             --build-arg http_proxy=http://internet.ford.com:83/ \
+             --build-arg no_proxy=.ford.com,localhost,127.0.0.1 \
+             -f Dockerfile .

poetry.lock ADDED Viewed

The diff for this file is too large to render. See raw diff

pyproject.toml ADDED Viewed

	@@ -0,0 +1,20 @@

+[tool.poetry]
+name = "docai"
+version = "0.1.0"
+description = ""
+authors = ["enandhag <enandhag@ford.com>"]
+readme = "README.md"
+[tool.poetry.dependencies]
+python = "^3.8"
+transformers = {extras = ["sentencepiece"], version = "^4.25.1"}
+datasets = "^2.8.0"
+torch = "^1.13.1"
+gradio = "^3.14.0"
+numpy = "^1.24.0"
+jupyter = "^1.0.0"
+[build-system]
+requires = ["poetry-core"]
+build-backend = "poetry.core.masonry.api"

scripts/__pycache__/predict.cpython-38.pyc ADDED Viewed

Binary file (1.49 kB). View file

scripts/predict.py ADDED Viewed

	@@ -0,0 +1,60 @@

+from utils.donut_utils import (
+    load_donut_model_and_processor,
+    prepare_data_using_processor,
+    load_image,
+)
+import re
+CHEQUE_PARSER_MODEL = "Nandhu/DocAI"
+TASK_PROMPT = "<s>"
+def parse_cheque_with_donut(input_image_path):
+    image = load_image(input_image_path)
+    donut_processor, model = load_donut_model_and_processor(CHEQUE_PARSER_MODEL)
+    cheque_image_tensor, input_for_decoder = prepare_data_using_processor(
+        donut_processor, image, TASK_PROMPT
+    )
+    outputs = model.generate(
+        cheque_image_tensor,
+        decoder_input_ids=input_for_decoder,
+        max_length=model.decoder.config.max_position_embeddings,
+        early_stopping=True,
+        pad_token_id=donut_processor.tokenizer.pad_token_id,
+        eos_token_id=donut_processor.tokenizer.eos_token_id,
+        use_cache=True,
+        num_beams=1,
+        bad_words_ids=[[donut_processor.tokenizer.unk_token_id]],
+        return_dict_in_generate=True,
+        output_scores=True,
+    )
+    decoded_output_sequence = donut_processor.batch_decode(outputs.sequences)[0]
+    extracted_cheque_details = decoded_output_sequence.replace(
+        donut_processor.tokenizer.eos_token, ""
+    ).replace(donut_processor.tokenizer.pad_token, "")
+    ## remove task prompt from token sequence
+    cleaned_cheque_details = re.sub(
+        r"<.*?>", "", extracted_cheque_details, count=1
+    ).strip()
+    ## generate ordered json sequence from output token sequence
+    cheque_details_json = donut_processor.token2json(cleaned_cheque_details)
+    print("cheque_details_json:", cheque_details_json)
+    ## extract required fields from predicted json
+    amt_in_words = cheque_details_json["VALUE_LETTERS"]
+    amt_in_figures = cheque_details_json["VALUE_NUMBERS"]
+    payee_name = cheque_details_json["USER2NAME"]
+    bank_name = cheque_details_json["BANK_NAME"]
+    return (payee_name, amt_in_words, amt_in_figures, bank_name)

utils/__pycache__/donut_utils.cpython-38.pyc ADDED Viewed

Binary file (1.09 kB). View file

utils/donut_utils.py ADDED Viewed

	@@ -0,0 +1,31 @@

+from transformers import DonutProcessor, VisionEncoderDecoderModel
+from PIL import Image
+import torch
+device = "cuda" if torch.cuda.is_available() else "cpu"
+def load_image(image_path):
+    image = Image.open(image_path).convert("RGB")
+    return image
+def load_donut_model_and_processor(trained_model_repo):
+    donut_processor = DonutProcessor.from_pretrained(trained_model_repo)
+    model = VisionEncoderDecoderModel.from_pretrained(trained_model_repo)
+    model.to(device)
+    return donut_processor, model
+def prepare_data_using_processor(donut_processor, image, task_prompt):
+    ## Pass image through donut processor's feature extractor and retrieve image tensor
+    pixel_values = donut_processor(image, return_tensors="pt").pixel_values
+    pixel_values = pixel_values.to(device)
+    ## Pass task prompt for document (cheque) parsing task to donut processor's tokenizer and retrieve the input_ids
+    decoder_input_ids = donut_processor.tokenizer(
+        task_prompt, add_special_tokens=False, return_tensors="pt"
+    )["input_ids"]
+    decoder_input_ids = decoder_input_ids.to(device)
+    return pixel_values, decoder_input_ids