Sinhala LightOnOCR-2-1B QLoRA Model 🇱🇰

Fine-tuned LightOnOCR-2-1B model for high-accuracy Sinhala language OCR on historical legal documents

🚀 Quick Start • 📊 Performance • 📖 Usage • 🔧 Training • 🎓 Citation

📋 Model Description

This model is a QLoRA fine-tuned version of LightOnOCR-2-1B specifically optimized for Sinhala (සිංහල) language OCR on historical and contemporary legal documents. The model achieves 98.95% character accuracy on a test set spanning over a century of Sri Lankan legal texts (1981-2019).

Key Features

🎯 High Accuracy: 98.95% character accuracy on Sinhala legal documents
📜 Historical Coverage: Evaluated on documents from 1981-2019
⚡ Efficient: QLoRA fine-tuning with 4-bit quantization (~3.67% trainable parameters)
🖥️ Optimized: Trained on NVIDIA RTX 4080 SUPER
💾 Low Resource: Runs on consumer GPUs with 4-bit quantization
🔄 Flexible Loading: Supports both QLoRA (4-bit) and standard LoRA (full-precision) inference

Model Details

Property	Value
Base Model	lightonai/LightOnOCR-2-1B
Model Type	Vision-Language Model (VLM)
Fine-tuning Method	QLoRA (4-bit NF4 quantization + LoRA)
Language	Sinhala (සිංහල)
License	Apache 2.0
Total Parameters	~1.04B (base)
Trainable Parameters	38.27M (3.67%)
Precision	4-bit quantized (NF4)

📊 Performance Metrics

Overall Performance (202 Test Samples)

Metric	Score	Description
Character Accuracy	98.95%	Percentage of correctly recognized characters
CER (Character Error Rate)	0.0105	Lower is better (0 = perfect)
WER (Word Error Rate)	0.0563	Word-level error rate
BLEU Score	0.9808	Text similarity score (0-1)
ANLS	0.9895	Average Normalized Levenshtein Similarity
METEOR	0.9492	Semantic similarity score

Summary Statistics

Statistic	Value
Median Accuracy	99.42%
Std Dev Accuracy	1.34%
Samples ≥ 90% accuracy	201/202 (99.5%)
Samples ≥ 80% accuracy	202/202 (100%)
Samples < 50% accuracy	0/202 (0%)

🚀 Quick Start

Installation

pip install transformers==5.0.0 peft bitsandbytes Pillow

Option 1: QLoRA Inference (4-bit Quantized — Recommended for Low VRAM)

Load the base model with 4-bit quantization and apply the LoRA adapter on top. This matches the original training setup and requires ~2-3 GB VRAM.

import torch
from transformers import LightOnOcrForConditionalGeneration, LightOnOcrProcessor, BitsAndBytesConfig
from peft import PeftModel
from PIL import Image

# Configuration
BASE_MODEL_ID = "lightonai/LightOnOCR-2-1B"
ADAPTER_ID = "avishadilhara/sinhala-lightonocr-2-1b-Qlora"
LONGEST_EDGE = 1540

# Load processor
processor = LightOnOcrProcessor.from_pretrained(ADAPTER_ID)
processor.tokenizer.padding_side = "left"

# Load base model with 4-bit quantization (QLoRA)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

model = LightOnOcrForConditionalGeneration.from_pretrained(
    BASE_MODEL_ID,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    quantization_config=bnb_config
)

# Load QLoRA adapter
model = PeftModel.from_pretrained(model, ADAPTER_ID)
model.eval()

# Run inference
image = Image.open("your_image.png").convert("RGB")

messages = [
    {"role": "user", "content": [{"type": "image"}]},
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(
    text=text,
    images=[image],
    return_tensors="pt",
    size={"longest_edge": LONGEST_EDGE},
).to(model.device)

with torch.no_grad():
    generated_ids = model.generate(
        **inputs,
        max_new_tokens=4096,
        do_sample=False,
    )

result = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(result)

Option 2: LoRA Inference (Full Precision — Higher Quality)

Load the base model in full precision (bf16) and apply the LoRA adapter. No quantization — better quality but requires ~4-5 GB VRAM.

import torch
from transformers import LightOnOcrForConditionalGeneration, LightOnOcrProcessor
from peft import PeftModel
from PIL import Image

# Configuration
BASE_MODEL_ID = "lightonai/LightOnOCR-2-1B"
ADAPTER_ID = "avishadilhara/sinhala-lightonocr-2-1b-Qlora"
LONGEST_EDGE = 1540

# Load processor
processor = LightOnOcrProcessor.from_pretrained(ADAPTER_ID)
processor.tokenizer.padding_side = "left"

# Load base model in full precision (no quantization)
model = LightOnOcrForConditionalGeneration.from_pretrained(
    BASE_MODEL_ID,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

# Load LoRA adapter (same weights, no quantization on base)
model = PeftModel.from_pretrained(model, ADAPTER_ID)
model.eval()

# Run inference
image = Image.open("your_image.png").convert("RGB")

messages = [
    {"role": "user", "content": [{"type": "image"}]},
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(
    text=text,
    images=[image],
    return_tensors="pt",
    size={"longest_edge": LONGEST_EDGE},
).to(model.device)

with torch.no_grad():
    generated_ids = model.generate(
        **inputs,
        max_new_tokens=4096,
        do_sample=False,
    )

result = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(result)

Note: Both options use the same LoRA adapter weights. The difference is whether the base model is quantized (QLoRA) or loaded in full precision (LoRA). QLoRA uses less VRAM; LoRA may give slightly better quality.

🔧 Training Details

Dataset

Split	Samples
Train	707
Validation	101
Test	202
Total	1010

Dataset: avishadilhara/sinhala-ocr-lk-acts-1010

QLoRA Configuration

Parameter	Value
LoRA Rank (r)	32
LoRA Alpha	64
LoRA Dropout	0.1
Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Task Type	CAUSAL_LM
Quantization	4-bit NF4 with double quantization
Compute dtype	bfloat16

Training Arguments

Parameter	Value
Max Epochs	20 (early stopped at 4)
Batch Size	4
Learning Rate	2e-4 (linear schedule)
Warmup Steps	10
Weight Decay	0.001
Max Grad Norm	1.0
Optimizer	AdamW (fused)
Precision	bf16
Early Stopping	patience=1
Image Size	longest_edge=1540
Max Length	4096 tokens

Training Loss

Epoch	Training Loss	Validation Loss
1	0.0336	0.0341
2	0.0284	0.0277
3	0.0205	0.0234
4	0.0139	0.0248

Best model selected at epoch 3 (lowest validation loss).

Hardware

GPU: NVIDIA RTX 4080 SUPER
Training Time: ~3 hours (4 epochs)

🎓 Citation

If you use this model, please cite:

@misc{sinhala-lightonocr-2-1b-Qlora,
  author = {Avisha Dilhara},
  title = {Sinhala LightOnOCR-2-1B QLoRA: Fine-tuned OCR for Sinhala Legal Documents},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/avishadilhara/sinhala-lightonocr-2-1b-Qlora}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for avishadilhara/sinhala-lightonocr-2-1b-Qlora

Base model

lightonai/LightOnOCR-2-1B

Adapter

(7)

this model

Dataset used to train avishadilhara/sinhala-lightonocr-2-1b-Qlora

Evaluation results

Character Accuracy on Sinhala Legal Acts OCR
self-reported

98.950
Character Error Rate on Sinhala Legal Acts OCR
self-reported

0.011
BLEU Score on Sinhala Legal Acts OCR
self-reported

0.981