Sinhala LightOnOCR-2-1B QLoRA Model πŸ‡±πŸ‡°

License Model Dataset

Fine-tuned LightOnOCR-2-1B model for high-accuracy Sinhala language OCR on historical legal documents

πŸš€ Quick Start β€’ πŸ“Š Performance β€’ πŸ“– Usage β€’ πŸ”§ Training β€’ πŸŽ“ Citation


πŸ“‹ Model Description

This model is a QLoRA fine-tuned version of LightOnOCR-2-1B specifically optimized for Sinhala (ΰ·ƒΰ·’ΰΆ‚ΰ·„ΰΆ½) language OCR on historical and contemporary legal documents. The model achieves 98.95% character accuracy on a test set spanning over a century of Sri Lankan legal texts (1981-2019).

Key Features

  • 🎯 High Accuracy: 98.95% character accuracy on Sinhala legal documents
  • πŸ“œ Historical Coverage: Evaluated on documents from 1981-2019
  • ⚑ Efficient: QLoRA fine-tuning with 4-bit quantization (~3.67% trainable parameters)
  • πŸ–₯️ Optimized: Trained on NVIDIA RTX 4080 SUPER
  • πŸ’Ύ Low Resource: Runs on consumer GPUs with 4-bit quantization
  • πŸ”„ Flexible Loading: Supports both QLoRA (4-bit) and standard LoRA (full-precision) inference

Model Details

Property Value
Base Model lightonai/LightOnOCR-2-1B
Model Type Vision-Language Model (VLM)
Fine-tuning Method QLoRA (4-bit NF4 quantization + LoRA)
Language Sinhala (ΰ·ƒΰ·’ΰΆ‚ΰ·„ΰΆ½)
License Apache 2.0
Total Parameters ~1.04B (base)
Trainable Parameters 38.27M (3.67%)
Precision 4-bit quantized (NF4)

πŸ“Š Performance Metrics

Overall Performance (202 Test Samples)

Metric Score Description
Character Accuracy 98.95% Percentage of correctly recognized characters
CER (Character Error Rate) 0.0105 Lower is better (0 = perfect)
WER (Word Error Rate) 0.0563 Word-level error rate
BLEU Score 0.9808 Text similarity score (0-1)
ANLS 0.9895 Average Normalized Levenshtein Similarity
METEOR 0.9492 Semantic similarity score

Summary Statistics

Statistic Value
Median Accuracy 99.42%
Std Dev Accuracy 1.34%
Samples β‰₯ 90% accuracy 201/202 (99.5%)
Samples β‰₯ 80% accuracy 202/202 (100%)
Samples < 50% accuracy 0/202 (0%)

πŸš€ Quick Start

Installation

pip install transformers==5.0.0 peft bitsandbytes Pillow

Option 1: QLoRA Inference (4-bit Quantized β€” Recommended for Low VRAM)

Load the base model with 4-bit quantization and apply the LoRA adapter on top. This matches the original training setup and requires ~2-3 GB VRAM.

import torch
from transformers import LightOnOcrForConditionalGeneration, LightOnOcrProcessor, BitsAndBytesConfig
from peft import PeftModel
from PIL import Image

# Configuration
BASE_MODEL_ID = "lightonai/LightOnOCR-2-1B"
ADAPTER_ID = "avishadilhara/sinhala-lightonocr-2-1b-Qlora"
LONGEST_EDGE = 1540

# Load processor
processor = LightOnOcrProcessor.from_pretrained(ADAPTER_ID)
processor.tokenizer.padding_side = "left"

# Load base model with 4-bit quantization (QLoRA)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

model = LightOnOcrForConditionalGeneration.from_pretrained(
    BASE_MODEL_ID,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    quantization_config=bnb_config
)

# Load QLoRA adapter
model = PeftModel.from_pretrained(model, ADAPTER_ID)
model.eval()

# Run inference
image = Image.open("your_image.png").convert("RGB")

messages = [
    {"role": "user", "content": [{"type": "image"}]},
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(
    text=text,
    images=[image],
    return_tensors="pt",
    size={"longest_edge": LONGEST_EDGE},
).to(model.device)

with torch.no_grad():
    generated_ids = model.generate(
        **inputs,
        max_new_tokens=4096,
        do_sample=False,
    )

result = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(result)

Option 2: LoRA Inference (Full Precision β€” Higher Quality)

Load the base model in full precision (bf16) and apply the LoRA adapter. No quantization β€” better quality but requires ~4-5 GB VRAM.

import torch
from transformers import LightOnOcrForConditionalGeneration, LightOnOcrProcessor
from peft import PeftModel
from PIL import Image

# Configuration
BASE_MODEL_ID = "lightonai/LightOnOCR-2-1B"
ADAPTER_ID = "avishadilhara/sinhala-lightonocr-2-1b-Qlora"
LONGEST_EDGE = 1540

# Load processor
processor = LightOnOcrProcessor.from_pretrained(ADAPTER_ID)
processor.tokenizer.padding_side = "left"

# Load base model in full precision (no quantization)
model = LightOnOcrForConditionalGeneration.from_pretrained(
    BASE_MODEL_ID,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

# Load LoRA adapter (same weights, no quantization on base)
model = PeftModel.from_pretrained(model, ADAPTER_ID)
model.eval()

# Run inference
image = Image.open("your_image.png").convert("RGB")

messages = [
    {"role": "user", "content": [{"type": "image"}]},
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(
    text=text,
    images=[image],
    return_tensors="pt",
    size={"longest_edge": LONGEST_EDGE},
).to(model.device)

with torch.no_grad():
    generated_ids = model.generate(
        **inputs,
        max_new_tokens=4096,
        do_sample=False,
    )

result = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(result)

Note: Both options use the same LoRA adapter weights. The difference is whether the base model is quantized (QLoRA) or loaded in full precision (LoRA). QLoRA uses less VRAM; LoRA may give slightly better quality.


πŸ”§ Training Details

Dataset

Split Samples
Train 707
Validation 101
Test 202
Total 1010

Dataset: avishadilhara/sinhala-ocr-lk-acts-1010

QLoRA Configuration

Parameter Value
LoRA Rank (r) 32
LoRA Alpha 64
LoRA Dropout 0.1
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Task Type CAUSAL_LM
Quantization 4-bit NF4 with double quantization
Compute dtype bfloat16

Training Arguments

Parameter Value
Max Epochs 20 (early stopped at 4)
Batch Size 4
Learning Rate 2e-4 (linear schedule)
Warmup Steps 10
Weight Decay 0.001
Max Grad Norm 1.0
Optimizer AdamW (fused)
Precision bf16
Early Stopping patience=1
Image Size longest_edge=1540
Max Length 4096 tokens

Training Loss

Epoch Training Loss Validation Loss
1 0.0336 0.0341
2 0.0284 0.0277
3 0.0205 0.0234
4 0.0139 0.0248

Best model selected at epoch 3 (lowest validation loss).

Hardware

  • GPU: NVIDIA RTX 4080 SUPER
  • Training Time: ~3 hours (4 epochs)

πŸŽ“ Citation

If you use this model, please cite:

@misc{sinhala-lightonocr-2-1b-Qlora,
  author = {Avisha Dilhara},
  title = {Sinhala LightOnOCR-2-1B QLoRA: Fine-tuned OCR for Sinhala Legal Documents},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/avishadilhara/sinhala-lightonocr-2-1b-Qlora}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for avishadilhara/sinhala-lightonocr-2-1b-Qlora

Adapter
(7)
this model

Dataset used to train avishadilhara/sinhala-lightonocr-2-1b-Qlora

Evaluation results