Lapa Ukrainian Handwriting OCR — LoRA Adapter

LoRA adapter on top of lapa-llm/lapa-v0.1.2-instruct (a Gemma-3-12B Ukrainian vision-language model) for Ukrainian handwritten-text recognition (HTR / OCR) on document crops.

The base Lapa model, applied zero-shot to handwriting crops, tends to paraphrase rather than transcribe literally. This adapter retrains the text decoder to emit a literal transcription of the text in the image. It was developed as an OCR component for a Ukrainian HTR pipeline (handwritten + printed regions, math formulas).

Results (internal validation)

Metric	Base Lapa (bf16)	+ this LoRA
Handwritten CER	3.28	0.113
Handwritten exact-match	1.3%	47.7%
Printed CER	1.08	0.187

CER > 1 on the base reflects heavy paraphrasing (output far longer than ground truth). The adapter removes that behavior and produces faithful transcriptions.

Intended use

Transcribing Ukrainian handwritten / printed text crops (region-level images, not full pages) into plain text.
As a cross-vote / ensemble OCR partner alongside other VLMs.

Not tuned for: full-page layout, non-Ukrainian scripts, or marginal / very low-quality regions (CER rises to ~0.55 on hard, low-confidence crops).

How to use

import torch
from PIL import Image
from peft import PeftModel
from transformers import AutoModelForImageTextToText, AutoProcessor

BASE = "lapa-llm/lapa-v0.1.2-instruct"
ADAPTER = "lapa-llm/lapa-ocr-lora"  # this repo

base = AutoModelForImageTextToText.from_pretrained(
    BASE,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    attn_implementation="sdpa",
)
model = PeftModel.from_pretrained(base, ADAPTER).eval()
processor = AutoProcessor.from_pretrained(BASE)

PROMPT = "Transcribe Ukrainian text literally. Output only the text, no preamble."
img = Image.open("crop.png").convert("RGB")
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": img},
        {"type": "text", "text": PROMPT},
    ],
}]

inputs = processor.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=True,
    return_dict=True, return_tensors="pt", padding=True,
).to(model.device, dtype=torch.bfloat16)

with torch.inference_mode():
    gen = model.generate(**inputs, max_new_tokens=256, do_sample=False, num_beams=1)
text = processor.batch_decode(
    gen[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True
)[0].strip()
print(text)

Training

Base: lapa-llm/lapa-v0.1.2-instruct (vision tower frozen; text decoder adapted)
Method: LoRA (PEFT) — r=64, alpha=128, dropout=0.05, bias=none
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Task type: CAUSAL_LM
Epochs: 5 · LR: 1e-4 · batch: 2 × grad-accum 4 · max_seq_len: 1024
Precision: bf16 · Hardware: 1× H100 80GB
Data: Ukrainian handwritten / printed text crops with literal transcriptions.

License

This adapter is a derivative of Gemma-3 (via Lapa) and is released under the Gemma Terms of Use. Use is subject to the Gemma Prohibited Use Policy. You must comply with the base model's license; see lapa-llm/lapa-v0.1.2-instruct.

Acknowledgements

Built on the Lapa LLM by the Ukrainian Catholic University, AGH University of Krakow, Igor Sikorsky Kyiv Polytechnic Institute, and Lviv Polytechnic. Base model: Gemma-3-12B (Google DeepMind).

Framework versions

PEFT 0.19.1
Transformers (Gemma-3 support: ≥ 4.50)

Downloads last month: 16

Model tree for VmF0x/lapa-ocr-lora

Base model

google/gemma-3-12b-pt

Finetuned

lapa-llm/lapa-12b-pt

Finetuned

lapa-llm/lapa-v0.1.2-instruct

Adapter

(5)

this model