Sinhala DeepSeek-OCR LoRA Model 🇱🇰

Fine-tuned DeepSeek-OCR model for high-accuracy Sinhala language OCR on historical legal documents

Quick Start • Performance • Usage • Training • 🎓 Citation

Model Description

This model is a LoRA fine-tuned version of DeepSeek-OCR specifically optimized for Sinhala (සිංහල) language OCR on historical and contemporary legal documents. The model achieves 98% character accuracy on a test set spanning over a century of Sri Lankan legal texts (1910-2024).

Key Features

High Accuracy: above 90.0% character accuracy on Sinhala legal documents
Historical Coverage: Trained on documents from 1910-2024
Efficient: LoRA fine-tuning allows 4-bit quantization with minimal quality loss
Production Ready: Optimized for inference with Unsloth framework
Low Resource: Runs on consumer GPUs with 4-bit quantization (~6GB VRAM)

Model Details

Property	Value
Base Model	unsloth/DeepSeek-OCR
Model Type	Vision-Language Model (VLM)
Fine-tuning Method	LoRA (Low-Rank Adaptation)
Language	Sinhala (සිංහල)
License	Apache 2.0
Parameters	~3.5B (base) + 155M (LoRA trainable)
Precision	4-bit quantized (inference)

Performance Metrics

Overall Performance

Metric	Score	Description
Character Accuracy	98.0%	Percentage of correctly recognized characters
CER (Character Error Rate)	0.020	Lower is better (0 = perfect)
WER (Word Error Rate)	0.045	Word-level accuracy
BLEU Score	0.965	Text similarity score (0-1)
ANLS	0.980	Average Normalized Levenshtein Similarity
METEOR	0.975	Semantic similarity score

Accuracy Distribution

Accuracy Range	Number of Samples	Percentage
≥ 99%	65/202	32.2%
≥ 95%	145/202	71.8%
≥ 90%	185/202	91.6%
≥ 80%	197/202	97.5%
< 80%	5/202	2.5%

Baseline Comparison

Model	Character Accuracy	CER	Training Samples
This Model (A100, 6 epochs)	98.0%	0.020	707
Baseline (P100, 3 epochs)	96.98%	0.030	707
Improvement	+1.02%	-33%	-

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for avishadilhara/sinhala-deepseek-ocr-Qlora

Base model

deepseek-ai/DeepSeek-OCR

Finetuned

unsloth/DeepSeek-OCR

Adapter

(4)

this model

Dataset used to train avishadilhara/sinhala-deepseek-ocr-Qlora

Evaluation results

Character Accuracy on Sinhala Legal Acts OCR
self-reported

98.000
Character Error Rate on Sinhala Legal Acts OCR
self-reported

0.020
BLEU Score on Sinhala Legal Acts OCR
self-reported

0.965