TrOCR Fine-tuned on the MultiHTR Ukrainian Dataset

This model is a fine-tuned version of kazars24/trocr-base-handwritten-ru for recognizing Ukrainian handwritten text. It was trained on the datasets used for the MultiHTR Ukrainian Transkribus models (see Tikhonov & Rabus 2024).

Model Description

TrOCR (Transformer-based OCR) is a vision-to-text model using a ViT encoder and a causal language model decoder. This version is fine-tuned specifically on Ukrainian handwriting from the 19th and 20th centuries.

Key preprocessing choice — aspect-ratio preservation: Line images are resized to 128 px height while preserving aspect ratio, rather than being squashed to 384×384. Ukrainian manuscript lines typically have aspect ratios of 4:1–12:1. Direct resizing to 384×384 causes ~10× width compression; aspect-ratio preservation maintains character resolution (from ~7 px to ~28 px character width).

Training Data

  • Sources: Vernadskyi National Library of Ukraine (manuscripts by Taras Shevchenko and Klyment Kvitka); National Museum of the Holodomor-Genocide in Kyiv; GRAC corpus (Lviv Polytechnic University); Prozhito Project; Foundation of the International Memorial Association; for more detailed acknowledgements, see the links above
  • Scope: Ukrainian handwritten manuscripts and documents, primarily 19th–20th century
  • Size: 19,307 training lines, 4,827 validation lines (773 pages)
  • Preprocessing: resize to 128 px height, aspect ratio preserved (LANCZOS); no background normalization
  • Train / Val split: 80% / 20%

Performance

Metric Value
CER (validation) 9.94%

Training Details

Parameter Value
Base model kazars24/trocr-base-handwritten-ru
Optimizer Adafactor
Learning rate 4e-5
Effective batch size 96
Epochs 20
FP16 Yes
Augmentation Rotation ±2°, brightness/contrast ±0.3
Framework HuggingFace Transformers, Seq2SeqTrainer

How to Use

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image

processor = TrOCRProcessor.from_pretrained("cyrillic-trocr/trocr-ukrainian-handwritten")
model = VisionEncoderDecoderModel.from_pretrained("cyrillic-trocr/trocr-ukrainian-handwritten")

image = Image.open("line_image.png").convert("RGB")
pixel_values = processor(images=image, return_tensors="pt").pixel_values

generated_ids = model.generate(pixel_values)
text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(text)

Acknowledgements

Funded by the Ministry of Science, Research and the Arts of Baden-Württemberg as part of the digital@bw digitization strategy.

Citation

If you use this model, please cite:

@article{tikhonov_rabus_2024,
  author    = {Tikhonov, Aleksej and Rabus, Achim},
  title     = {Handwritten Text Recognition of Ukrainian Manuscripts in the 21st Century:
               Possibilities, Challenges, and the Future of the First Generic AI-based Model},
  journal   = {Kyiv-Mohyla Humanities Journal},
  volume    = {11},
  year      = {2024},
  pages     = {226--247},
  doi       = {10.18523/2313-4895.11.2024.226-247}
}

MultiHTR project funded by the Ministry of Science, Research and the Arts of Baden-Württemberg (digital@bw).

TrOCR architecture:

@article{li2021trocr,
  title   = {TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models},
  author  = {Li, Minghao and Lv, Tengchao and Chen, Jingye and Cui, Lei and Lu, Yijuan and
             Florencio, Dinei and Zhang, Cha and Li, Zhoujun and Wei, Furu},
  journal = {arXiv preprint arXiv:2109.10282},
  year    = {2021}
}
Downloads last month
1,079
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cyrillic-trocr/trocr-ukrainian-handwritten

Finetuned
(3)
this model

Paper for cyrillic-trocr/trocr-ukrainian-handwritten