Fork of naver-clova-ix/donut-base-finetuned-cord-v2

This is fork of naver-clova-ix/donut-base-finetuned-cord-v2 implementing a custom handler.py as an example for how to use donut models with inference-endpoints


Donut (base-sized model, fine-tuned on CORD)

Donut model fine-tuned on CORD. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. and first released in this repository.

Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder.

Use with Inference Endpoints

Hugging Face Inference endpoints can directly work with binary data, this means that we can directly send our image from our document to the endpoint. We are going to use requests to send our requests. (make your you have it installed pip install requests)

result

Send requests with Pyton

load sample image

wget https://huggingface.co/philschmid/donut-base-finetuned-cord-v2/resolve/main/sample.png

send request to endpoint

import json
import requests as r
import mimetypes

ENDPOINT_URL="" # url of your endpoint
HF_TOKEN="" # organization token where you deployed your endpoint

def predict(path_to_image:str=None):
    with open(path_to_image, "rb") as i:
      b = i.read()
    headers= {
        "Authorization": f"Bearer {HF_TOKEN}",
        "Content-Type": mimetypes.guess_type(path_to_image)[0]
    }
    response = r.post(ENDPOINT_URL, headers=headers, data=b)
    return response.json()

prediction = predict(path_to_image="sample.png")

print(prediction)
# {'menu': [{'nm': '0571-1854 BLUS WANITA',
#   'unitprice': '@120.000',
#   'cnt': '1',
#   'price': '120,000'},
#  {'nm': '1002-0060 SHOPPING BAG', 'cnt': '1', 'price': '0'}],
# 'total': {'total_price': '120,000',
#  'changeprice': '0',
#  'creditcardprice': '120,000',
#  'menuqty_cnt': '1'}}

curl example

curl https://ak7gduay2ypyr9vp.us-east-1.aws.endpoints.huggingface.cloud \
-X POST \
--data-binary 'sample.png' \
-H "Authorization: Bearer XXX" \
-H "Content-Type: null"
Downloads last month
30
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using philschmid/donut-base-finetuned-cord-v2 1