Lapa Ukrainian Handwriting OCR — LoRA Adapter

LoRA adapter on top of lapa-llm/lapa-v0.1.2-instruct (a Gemma-3-12B Ukrainian vision-language model) for Ukrainian handwritten-text recognition (HTR / OCR) on document crops.

The base Lapa model, applied zero-shot to handwriting crops, tends to paraphrase rather than transcribe literally. This adapter retrains the text decoder to emit a literal transcription of the text in the image. It was developed as an OCR component for a Ukrainian HTR pipeline (handwritten + printed regions, math formulas).

Results (internal validation)

Metric Base Lapa (bf16) + this LoRA
Handwritten CER 3.28 0.113
Handwritten exact-match 1.3% 47.7%
Printed CER 1.08 0.187

CER > 1 on the base reflects heavy paraphrasing (output far longer than ground truth). The adapter removes that behavior and produces faithful transcriptions.

Intended use

  • Transcribing Ukrainian handwritten / printed text crops (region-level images, not full pages) into plain text.
  • As a cross-vote / ensemble OCR partner alongside other VLMs.

Not tuned for: full-page layout, non-Ukrainian scripts, or marginal / very low-quality regions (CER rises to ~0.55 on hard, low-confidence crops).

How to use

import torch
from PIL import Image
from peft import PeftModel
from transformers import AutoModelForImageTextToText, AutoProcessor

BASE = "lapa-llm/lapa-v0.1.2-instruct"
ADAPTER = "lapa-llm/lapa-ocr-lora"  # this repo

base = AutoModelForImageTextToText.from_pretrained(
    BASE,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    attn_implementation="sdpa",
)
model = PeftModel.from_pretrained(base, ADAPTER).eval()
processor = AutoProcessor.from_pretrained(BASE)

PROMPT = "Transcribe Ukrainian text literally. Output only the text, no preamble."
img = Image.open("crop.png").convert("RGB")
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": img},
        {"type": "text", "text": PROMPT},
    ],
}]

inputs = processor.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=True,
    return_dict=True, return_tensors="pt", padding=True,
).to(model.device, dtype=torch.bfloat16)

with torch.inference_mode():
    gen = model.generate(**inputs, max_new_tokens=256, do_sample=False, num_beams=1)
text = processor.batch_decode(
    gen[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True
)[0].strip()
print(text)

Training

  • Base: lapa-llm/lapa-v0.1.2-instruct (vision tower frozen; text decoder adapted)
  • Method: LoRA (PEFT) — r=64, alpha=128, dropout=0.05, bias=none
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Task type: CAUSAL_LM
  • Epochs: 5 · LR: 1e-4 · batch: 2 × grad-accum 4 · max_seq_len: 1024
  • Precision: bf16 · Hardware: 1× H100 80GB
  • Data: Ukrainian handwritten / printed text crops with literal transcriptions.

License

This adapter is a derivative of Gemma-3 (via Lapa) and is released under the Gemma Terms of Use. Use is subject to the Gemma Prohibited Use Policy. You must comply with the base model's license; see lapa-llm/lapa-v0.1.2-instruct.

Acknowledgements

Built on the Lapa LLM by the Ukrainian Catholic University, AGH University of Krakow, Igor Sikorsky Kyiv Polytechnic Institute, and Lviv Polytechnic. Base model: Gemma-3-12B (Google DeepMind).

Framework versions

  • PEFT 0.19.1
  • Transformers (Gemma-3 support: ≥ 4.50)
Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for VmF0x/lapa-ocr-lora

Adapter
(5)
this model