--- tags: - donut - image-to-text - invoices --- # Overview This repository contains a fine-tuned version of the Donut model for document understanding, specifically tailored for invoice processing. The Donut model is based on the OCR-free Document Understanding Transformer, introduced in the paper by Geewok et al. OCR-free Document Understanding Transformer, and initially released in the repository https://github.com/clovaai/donut. The purpose of this custom fine-tuning is to enhance the Donut model's performance specifically for invoice analysis and extraction. The model was trained on a custom dataset of annotated invoices, comprising several hundred examples. Although the dataset is not included in this repository, details on its availability will be provided later. # Model Details The Donut model is a transformer-based architecture that leverages self-attention mechanisms for document understanding. By fine-tuning the model with a custom dataset of invoices, we aim to improve its ability to accurately extract relevant information from invoices, such as vendor details, billing information, line items, and totals. [Demo can be found here](https://colab.research.google.com/drive/1zDvSysp24bCk60LR6172Z94eY1mRhKWF#scrollTo=f7RoSOEXUa6i)