donut-docai โ€” Korean Transaction-Statement Parser

Fine-tuned Donut (naver-clova-ix/donut-base) that reads a Korean transaction statement (๊ฑฐ๋ž˜๋ช…์„ธํ‘œ / ๊ณ„์‚ฐ์„œ) image and outputs structured JSON โ€” no OCR + rule engine.

Code & full pipeline: https://github.com/KyoungsoonKim00/donut-document-ai

Usage

import torch
from PIL import Image
from transformers import DonutProcessor, VisionEncoderDecoderModel

processor = DonutProcessor.from_pretrained("ksk00/donut-docai")
model = VisionEncoderDecoderModel.from_pretrained("ksk00/donut-docai")
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device).eval()

image = Image.open("document.png").convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values.to(device)
decoder_input_ids = processor.tokenizer(
    "<s_gt_parse>", return_tensors="pt", add_special_tokens=False
).input_ids.to(device)

outputs = model.generate(
    pixel_values, decoder_input_ids=decoder_input_ids,
    max_length=512, num_beams=5,
    pad_token_id=processor.tokenizer.pad_token_id,
    eos_token_id=processor.tokenizer.eos_token_id,
)
print(processor.batch_decode(outputs, skip_special_tokens=True)[0])

Output schema

Group Fields
์„œ๋ฅ˜ํŠน์„ฑ.* ์„œ๋ฅ˜์ข…๋ฅ˜, ๊ฑฐ๋ž˜์ผ, ํ•ฉ๊ณ„๊ธˆ์•ก
ํ”ผ๊ณต๊ธ‰์ž.* ์ด๋ฆ„, ๊ฑฐ๋ž˜์ „๋ฏธ์ง€๊ธ‰๊ธˆ, ์ž…๊ธˆ์•ก, ํ˜„์ž”์•ก
ํ’ˆ๋ชฉ.* ํ’ˆ๋ชฉ๋ช…, ์ฝ”๋“œ, ๋‹จ์œ„, ์ˆ˜๋Ÿ‰, ๋‹จ๊ฐ€, ๊ณต๊ธ‰๊ฐ€์•ก, ์„ธ์•ก, ์ˆ˜๋Ÿ‰ํ•ฉ๊ณ„, ๊ณต๊ธ‰๊ฐ€์•กํ•ฉ๊ณ„, ์„ธ์•กํ•ฉ๊ณ„

Training

  • Base: naver-clova-ix/donut-base (Swin-B encoder + mBART decoder)
  • Image size 720ร—960, task prompt <s_gt_parse>, max length 512
  • AdamW lr 5e-5, weight decay 0.01, warmup 5%, 15 epochs, fp16, gradient checkpointing

Limitations

Trained on a small in-house dataset (tens of documents). The model overfits and can collapse into repeated tokens on unseen layouts. Treat as a proof-of-concept, not production-ready. See the GitHub repo for improvement directions.

Downloads last month
49
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ksk00/donut-docai

Finetuned
(485)
this model