Document Question Answering (also known as Document Visual Question Answering) is the task of answering questions on document images. Document question answering models take a (document, question) pair as input and return an answer in natural language. Models usually rely on multi-modal features, combining text, position of words (bounding-boxes) and image.

Inputs
###### Question

What is the idea behind the consumer relations efficiency team?

Output

Balance cost efficiency with quality customer service

## Use Cases

Document Question Answering models can be used to answer natural language questions about documents. Typically, document QA models consider textual, layout and potentially visual information. This is useful when the question requires some understanding of the visual aspects of the document. Nevertheless, certain document QA models can work without document images. Hence the task is not limited to visually-rich documents and allows users to ask questions based on spreadsheets, text PDFs, etc!

### Document Parsing

One of the most popular use cases of document question answering models is the parsing of structured documents. For example, you can extract the name, address, and other information from a form. You can also use the model to extract information from a table, or even a resume.

### Invoice Information Extraction

Another very popular use case is invoice information extraction. For example, you can extract the invoice number, the invoice date, the total amount, the VAT number, and the invoice recipient.

## Inference

You can infer with Document QA models with the 🤗 Transformers library using the document-question-answering pipeline. If no model checkpoint is given, the pipeline will be initialized with impira/layoutlm-document-qa. This pipeline takes question(s) and document(s) as input, and returns the answer.
👉 Note that the question answering task solved here is extractive: the model extracts the answer from a context (the document).

from transformers import pipeline
from PIL import Image

question = "What is the purchase amount?"
image = Image.open("your-document.png")

pipe(image=image, question=question)



## Useful Resources

### Notebooks

The contents of this page are contributed by Eliott Zemour and reviewed by Kwadwo Agyapon-Ntra and Ankur Goyal.

## Compatible libraries

Transformers
Examples
Examples
Drag image file here or click to browse from your device
This model can be loaded on the Inference API on-demand.

Note A LayoutLM model for the document QA task, fine-tuned on DocVQA and SQuAD2.0.

Note A special model for OCR-free Document QA task. Donut model fine-tuned on DocVQA.