What libraries can I use for Document Question Answering?

The transformersand transformers.js libraries are compatible with Document Question Answering.

What models can I use for Document Question Answering?

The impira/layoutlm-document-qa, impira/layoutlm-invoices, microsoft/udop-large, and google/pix2struct-docvqa-large models can be used for Document Question Answering.

What datasets can I use for Document Question Answering?

The HuggingFaceM4/Docmatixand eliolio/docvqa datasets can be used for Document Question Answering.

What metrics can I use for Document Question Answering?

The anlsand exact-match metrics can be used for Document Question Answering.

Tasks

Document Question Answering

Document Question Answering (also known as Document Visual Question Answering) is the task of answering questions on document images. Document question answering models take a (document, question) pair as input and return an answer in natural language. Models usually rely on multi-modal features, combining text, position of words (bounding-boxes) and image.

Inputs

Question

What is the idea behind the consumer relations efficiency team?

Document Question Answering Model

Output

Answer

Balance cost efficiency with quality customer service

About Document Question Answering

Use Cases

Document Question Answering models can be used to answer natural language questions about documents. Typically, document QA models consider textual, layout and potentially visual information. This is useful when the question requires some understanding of the visual aspects of the document. Nevertheless, certain document QA models can work without document images. Hence the task is not limited to visually-rich documents and allows users to ask questions based on spreadsheets, text PDFs, etc!

Document Parsing

One of the most popular use cases of document question answering models is the parsing of structured documents. For example, you can extract the name, address, and other information from a form. You can also use the model to extract information from a table, or even a resume.

Invoice Information Extraction

Another very popular use case is invoice information extraction. For example, you can extract the invoice number, the invoice date, the total amount, the VAT number, and the invoice recipient.

Inference

You can infer with Document QA models with the 🤗 Transformers library using the document-question-answering pipeline. If no model checkpoint is given, the pipeline will be initialized with impira/layoutlm-document-qa. This pipeline takes question(s) and document(s) as input, and returns the answer.
👉 Note that the question answering task solved here is extractive: the model extracts the answer from a context (the document).

from transformers import pipeline
from PIL import Image

pipe = pipeline("document-question-answering", model="naver-clova-ix/donut-base-finetuned-docvqa")

question = "What is the purchase amount?"
image = Image.open("your-document.png")

pipe(image=image, question=question)

## [{'answer': '20,000$'}]

Useful Resources

Would you like to learn more about Document QA? Awesome! Here are some curated resources that you may find helpful!

Notebooks

Documentation

Document question answering task guide

The contents of this page are contributed by Eliott Zemour and reviewed by Kwadwo Agyapon-Ntra and Ankur Goyal.

Compatible libraries

Transformers

Transformers.js

using impira/layoutlm-invoices

Models for Document Question Answering

Browse Models (216)

impira/layoutlm-document-qa

Document Question Answering • Updated Mar 18, 2023 • 54.2k • • 1.07k

Note A robust document question answering model.

impira/layoutlm-invoices

Document Question Answering • Updated Mar 25, 2023 • 8.01k • • 187

Note A document question answering model specialized in invoices.

microsoft/udop-large

Image-Text-to-Text • Updated Mar 11, 2024 • 5.5k • 112

Note A special model for OCR-free document question answering.

google/pix2struct-docvqa-large

Visual Question Answering • Updated May 19, 2023 • 227 • 31

Note A powerful model for document question answering.

Datasets for Document Question Answering

Browse Datasets (19)

HuggingFaceM4/Docmatix

Viewer • Updated Aug 26, 2024 • 2.55M • 14.1k • 254

Note Largest document understanding dataset.

eliolio/docvqa

Updated Oct 11, 2022 • 62 • 3

Note Dataset from the 2020 DocVQA challenge. The documents are taken from the UCSF Industry Documents Library.

Spaces using Document Question Answering

🦉

impira/docquery

Note A robust document question answering application.

💸

impira/invoices

Note An application that can answer questions from invoices.

🦀

merve/compare_docvqa_models

Note An application to compare different document question answering models.

Metrics for Document Question Answering

anls: The evaluation metric for the DocVQA challenge is the Average Normalized Levenshtein Similarity (ANLS). This metric is flexible to character regognition errors and compares the predicted answer with the ground truth answer.

exact-match: Exact Match is a metric based on the strict character match of the predicted answer and the right answer. For answers predicted correctly, the Exact Match will be 1. Even if only one character is different, Exact Match will be 0