Instructions to use VmF0x/lapa-ocr-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use VmF0x/lapa-ocr-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("lapa-llm/lapa-v0.1.2-instruct") model = PeftModel.from_pretrained(base_model, "VmF0x/lapa-ocr-lora") - Notebooks
- Google Colab
- Kaggle
Lapa Ukrainian Handwriting OCR — LoRA Adapter
LoRA adapter on top of lapa-llm/lapa-v0.1.2-instruct
(a Gemma-3-12B Ukrainian vision-language model) for Ukrainian handwritten-text
recognition (HTR / OCR) on document crops.
The base Lapa model, applied zero-shot to handwriting crops, tends to paraphrase rather than transcribe literally. This adapter retrains the text decoder to emit a literal transcription of the text in the image. It was developed as an OCR component for a Ukrainian HTR pipeline (handwritten + printed regions, math formulas).
Results (internal validation)
| Metric | Base Lapa (bf16) | + this LoRA |
|---|---|---|
| Handwritten CER | 3.28 | 0.113 |
| Handwritten exact-match | 1.3% | 47.7% |
| Printed CER | 1.08 | 0.187 |
CER > 1 on the base reflects heavy paraphrasing (output far longer than ground truth). The adapter removes that behavior and produces faithful transcriptions.
Intended use
- Transcribing Ukrainian handwritten / printed text crops (region-level images, not full pages) into plain text.
- As a cross-vote / ensemble OCR partner alongside other VLMs.
Not tuned for: full-page layout, non-Ukrainian scripts, or marginal / very low-quality regions (CER rises to ~0.55 on hard, low-confidence crops).
How to use
import torch
from PIL import Image
from peft import PeftModel
from transformers import AutoModelForImageTextToText, AutoProcessor
BASE = "lapa-llm/lapa-v0.1.2-instruct"
ADAPTER = "lapa-llm/lapa-ocr-lora" # this repo
base = AutoModelForImageTextToText.from_pretrained(
BASE,
torch_dtype=torch.bfloat16,
device_map="auto",
attn_implementation="sdpa",
)
model = PeftModel.from_pretrained(base, ADAPTER).eval()
processor = AutoProcessor.from_pretrained(BASE)
PROMPT = "Transcribe Ukrainian text literally. Output only the text, no preamble."
img = Image.open("crop.png").convert("RGB")
messages = [{
"role": "user",
"content": [
{"type": "image", "image": img},
{"type": "text", "text": PROMPT},
],
}]
inputs = processor.apply_chat_template(
messages, add_generation_prompt=True, tokenize=True,
return_dict=True, return_tensors="pt", padding=True,
).to(model.device, dtype=torch.bfloat16)
with torch.inference_mode():
gen = model.generate(**inputs, max_new_tokens=256, do_sample=False, num_beams=1)
text = processor.batch_decode(
gen[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True
)[0].strip()
print(text)
Training
- Base:
lapa-llm/lapa-v0.1.2-instruct(vision tower frozen; text decoder adapted) - Method: LoRA (PEFT) — r=64, alpha=128, dropout=0.05, bias=none
- Target modules:
q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj - Task type:
CAUSAL_LM - Epochs: 5 · LR: 1e-4 · batch: 2 × grad-accum 4 · max_seq_len: 1024
- Precision: bf16 · Hardware: 1× H100 80GB
- Data: Ukrainian handwritten / printed text crops with literal transcriptions.
License
This adapter is a derivative of Gemma-3 (via Lapa) and is released under the
Gemma Terms of Use. Use is subject to the
Gemma Prohibited Use Policy.
You must comply with the base model's license; see
lapa-llm/lapa-v0.1.2-instruct.
Acknowledgements
Built on the Lapa LLM by the Ukrainian Catholic University, AGH University of Krakow, Igor Sikorsky Kyiv Polytechnic Institute, and Lviv Polytechnic. Base model: Gemma-3-12B (Google DeepMind).
Framework versions
- PEFT 0.19.1
- Transformers (Gemma-3 support: ≥ 4.50)
- Downloads last month
- 16
Model tree for VmF0x/lapa-ocr-lora
Base model
google/gemma-3-12b-pt