philschmid
/

donut-base-finetuned-cord-v2

vision-encoder-decoder

endpoints-template

Inference Endpoints

Model card Files Files and versions Community

donut-base-finetuned-cord-v2 / README.md

philschmid's picture

philschmid HF staff

Update README.md

2775dfb over 1 year ago

|

raw history blame contribute delete

No virus

2.63 kB

	---
	license: mit
	tags:
	- donut
	- image-to-text
	- vision
	- endpoints-template
	---

	# Fork of [naver-clova-ix/donut-base-finetuned-cord-v2](https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v2)

	> This is fork of [naver-clova-ix/donut-base-finetuned-cord-v2](https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v2) implementing a custom `handler.py` as an example for how to use `donut` models with [inference-endpoints](https://hf.co/inference-endpoints)

	---

	# Donut (base-sized model, fine-tuned on CORD)

	Donut model fine-tuned on CORD. It was introduced in the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewok et al. and first released in [this repository](https://github.com/clovaai/donut).

	Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder.

	# Use with Inference Endpoints

	Hugging Face Inference endpoints can directly work with binary data, this means that we can directly send our image from our document to the endpoint. We are going to use requests to send our requests. (make your you have it installed `pip install requests`)

	![result](res.png)

	## Send requests with Pyton

	load sample image

	```bash
	wget https://huggingface.co/philschmid/donut-base-finetuned-cord-v2/resolve/main/sample.png
	```

	send request to endpoint

	```python
	import json
	import requests as r
	import mimetypes

	ENDPOINT_URL="" # url of your endpoint
	HF_TOKEN="" # organization token where you deployed your endpoint

	def predict(path_to_image:str=None):
	with open(path_to_image, "rb") as i:
	b = i.read()
	headers= {
	"Authorization": f"Bearer {HF_TOKEN}",
	"Content-Type": mimetypes.guess_type(path_to_image)[0]
	}
	response = r.post(ENDPOINT_URL, headers=headers, data=b)
	return response.json()

	prediction = predict(path_to_image="sample.png")

	print(prediction)
	# {'menu': [{'nm': '0571-1854 BLUS WANITA',
	# 'unitprice': '@120.000',
	# 'cnt': '1',
	# 'price': '120,000'},
	# {'nm': '1002-0060 SHOPPING BAG', 'cnt': '1', 'price': '0'}],
	# 'total': {'total_price': '120,000',
	# 'changeprice': '0',
	# 'creditcardprice': '120,000',
	# 'menuqty_cnt': '1'}}
	```



	curl example

	```bash
	curl https://ak7gduay2ypyr9vp.us-east-1.aws.endpoints.huggingface.cloud \
	-X POST \
	--data-binary 'sample.png' \
	-H "Authorization: Bearer XXX" \
	-H "Content-Type: null"
	```