File size: 3,014 Bytes

f6c202e
 
f21adb1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f6c202e
 
f21adb1
f6c202e
f21adb1
f6c202e
f21adb1
f6c202e
 
 
 
 
f21adb1
 
 
da54f70
f21adb1
 
 
f6c202e
f21adb1
f6c202e
f21adb1
f6c202e
dfb60af
f21adb1
 
f6c202e
f21adb1
f6c202e
 
 
 
 
f21adb1
 
 
f6c202e
 
 
f21adb1
 
f6c202e
f21adb1
f6c202e
 
 
f21adb1
f6c202e
f21adb1
 
 
 
 
f6c202e
 
 
f21adb1
 
 
 
f6c202e
f21adb1
f6c202e
f21adb1
 
f6c202e
f21adb1
f6c202e
 
 
f21adb1
 
 
 
f6c202e
f21adb1
f6c202e
f21adb1
 
 
f6c202e
f21adb1
f6c202e
f21adb1
f6c202e
85b3e21

---
library_name: transformers
tags:
- document-question-answering
- layoutlmv3
- ocr
- document-understanding
- paddleocr
- multilingual
- layout-aware
- lakshya-singh
license: apache-2.0
language:
- en
base_model:
- microsoft/layoutlmv3-base
datasets:
- nielsr/docvqa_1200_examples
---

# Document QA Model

This is a fine-tuned **document question-answering model** based on `layoutlmv3-base`. It is trained to understand documents using OCR data (via PaddleOCR) and accurately answer questions related to structured information in the document layout.

---

## Model Details

### Model Description

- **Model Name:** `document-qa-model`
- **Base Model:** [`microsoft/layoutlmv3-base`](https://huggingface.co/microsoft/layoutlmv3-base)
- **Fine-tuned by:** Lakshya Singh (solo contributor)
- **Languages:** English, Spanish, French, German, Italian 
- **License:** Apache-2.0 (inherited from base model)
- **Intended Use:** Extract answers to structured queries from scanned documents
- **Not funded** — this project was completed independently.

---

## Model Sources

- **Repository:** [`Github Link`](https://github.com/Lakshyasinghrawat12/DocumentQA-lakshya-rawat-document-qa-model)
- **Trained on:** Adapted version of [`nielsr/docvqa_1200_examples`](https://huggingface.co/datasets/nielsr/docvqa_1200_examples)
- **Model metrics:** See ![training_history.png](https://cdn-uploads.huggingface.co/production/uploads/66a7331438fbd9075584523f/MtMe5CZy3wb2nEG1wTRMc.png)

---

## Uses

### Direct Use

This model can be used for:
- Question Answering on document images (PDFs, invoices, utility bills)
- Information extraction tasks using OCR and layout-aware understanding

### Out-of-Scope Use

- Not suitable for conversational QA
- Not suitable for images with no OCR-processed text

---

## Training Details

### Dataset

The dataset consisted of:
- **Images** of utility bills and documents
- **OCR data** with bounding boxes (from PaddleOCR)
- **Queries** in English, Spanish, and Chinese
- **Answer spans** with match scores and positions

### Training Procedure

- Preprocessing: PaddleOCR was used to extract tokens, positions, and structure
- Model: LayoutLMv3-base
- Epochs: 4
- Learning rate schedule: Shown in image below

### Training Metrics

- **F1 Score** (validation): ![training_history.png](https://cdn-uploads.huggingface.co/production/uploads/66a7331438fbd9075584523f/MtMe5CZy3wb2nEG1wTRMc.png)
- **Loss & Learning Rate Chart**: ![training_history.png](https://cdn-uploads.huggingface.co/production/uploads/66a7331438fbd9075584523f/MtMe5CZy3wb2nEG1wTRMc.png)

---

## Evaluation

### Metrics Used
- F1 score
- Match score of predicted spans
- Token overlap vs ground truth

### Summary

The model performs well on document-style QA tasks, especially with:
- Clearly structured OCR results
- Document types similar to utility bills, invoices, and forms

---

## How to Use

- Available on my [`Github`](https://github.com/Lakshyasinghrawat12/DocumentQA-lakshya-rawat-document-qa-model)