Spaces:

nielsr
/

donut-cord

Build error

Get the json content from image with naver-clova-ix/donut-base-finetuned-docvqa

by Joyantac33 - opened Jun 29, 2023

Jun 29, 2023

I am using "naver-clova-ix/donut-base-finetuned-docvqa" model and want to print the full content of the result json after it reads the image without invoking any prompts or user input. I just want it to parse the image and give me the full json content. How can I achieve that, please help. I am using below code:

import re
import gradio as gr

import torch
from transformers import DonutProcessor, VisionEncoderDecoderModel

processor = DonutProcessor.from_pretrained("naver-clova-ix/donut-base-finetuned-docvqa")
model = VisionEncoderDecoderModel.from_pretrained("naver-clova-ix/donut-base-finetuned-docvqa")

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

def process_document(image, question):

prepare encoder inputs

pixel_values = processor(image, return_tensors="pt").pixel_values
print(pixel_values)

prepare decoder inputs

task_prompt = "{user_input}"
prompt = task_prompt.replace("{user_input}", question)
decoder_input_ids = processor.tokenizer(prompt, add_special_tokens=False, return_tensors="pt").input_ids
print(decoder_input_ids)

generate answer

outputs = model.generate(
pixel_values.to(device),
decoder_input_ids=decoder_input_ids.to(device),
max_length=model.decoder.config.max_position_embeddings,
early_stopping=True,
pad_token_id=processor.tokenizer.pad_token_id,
eos_token_id=processor.tokenizer.eos_token_id,
use_cache=True,
num_beams=1,
bad_words_ids=[[processor.tokenizer.unk_token_id]],
return_dict_in_generate=True,
)

postprocess

sequence = processor.batch_decode(outputs.sequences)[0]
sequence = sequence.replace(processor.tokenizer.eos_token, "").replace(processor.tokenizer.pad_token, "")
sequence = re.sub(r"<.*?>", "", sequence, count=1).strip() # remove first task start token

json_content = processor.token2json(sequence)
print(json_content) # Print the full JSON content

return json_content
#return processor.token2json(sequence)
description = "Gradio Demo for Donut, an instance of VisionEncoderDecoderModel fine-tuned on DocVQA (document visual question answering). To use it, simply upload your image and type a question and click 'submit', or click one of the examples to load them. Read more at the links below."
article = "

Donut: OCR-free Document Understanding Transformer | Github Repo

"
demo = gr.Interface(
fn=process_document,
inputs=["image", "text"],
outputs="json",
title="Demo: Donut 🍩 for DocVQA",
description=description,
article=article,
enable_queue=True,
examples=[["example_1.png", "When is the coffee break?"], ["example_2.jpeg", "What's the population of Stoddard?"]],
cache_examples=False)

demo.launch()

nielsr

Owner Jun 29, 2023

Hi,

For that you need to use a model like https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v2 (or fine-tune one yourself on a custom dataset).

Next, you just need to give the decoder start token as prompt and the model will generate the entire JSON.

See this notebook as an example: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Donut/CORD/Quick_inference_with_DONUT_for_Document_Parsing.ipynb.

Joyantac33

Jun 29, 2023

Hi,

Thank you for the information. Can you help me like "dataset = load_dataset("hf-internal-testing/example-documents") "
what is the format of dataset to point to my custom folder like "hf-internal-testing/example-documents".

Joyantac33

Jun 30, 2023

I am trying to train custom model but getting below error, can you help.

!python /content/donut/train.py --config /content/sample_data/train_cord.yaml

dataset_name_or_paths:

../content/pan_set
train_batch_sizes:
1
check_val_every_n_epochs: 10
max_steps: -1
result_path: /content/pan_set
exp_name: train_cord
exp_version: 20230630_083635
Config is saved at /content/pan_set/train_cord/20230630_083635/config.yaml
Traceback (most recent call last):
File "/content/donut/train.py", line 149, in
train(config)
File "/content/donut/train.py", line 55, in train
pl.utilities.seed.seed_everything(config.get("seed", 42), workers=True)
AttributeError: module 'pytorch_lightning.utilities.seed' has no attribute 'seed_everything'

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment