Sinhala DeepSeek-OCR LoRA Model 🇱🇰

License Model Dataset

Fine-tuned DeepSeek-OCR model for high-accuracy Sinhala language OCR on historical legal documents

Quick Start Performance Usage Training🎓 Citation


Model Description

This model is a LoRA fine-tuned version of DeepSeek-OCR specifically optimized for Sinhala (සිංහල) language OCR on historical and contemporary legal documents. The model achieves 98% character accuracy on a test set spanning over a century of Sri Lankan legal texts (1910-2024).

Key Features

  • High Accuracy: above 90.0% character accuracy on Sinhala legal documents
  • Historical Coverage: Trained on documents from 1910-2024
  • Efficient: LoRA fine-tuning allows 4-bit quantization with minimal quality loss
  • Production Ready: Optimized for inference with Unsloth framework
  • Low Resource: Runs on consumer GPUs with 4-bit quantization (~6GB VRAM)

Model Details

Property Value
Base Model unsloth/DeepSeek-OCR
Model Type Vision-Language Model (VLM)
Fine-tuning Method LoRA (Low-Rank Adaptation)
Language Sinhala (සිංහල)
License Apache 2.0
Parameters ~3.5B (base) + 155M (LoRA trainable)
Precision 4-bit quantized (inference)

Performance Metrics

Overall Performance

Metric Score Description
Character Accuracy 98.0% Percentage of correctly recognized characters
CER (Character Error Rate) 0.020 Lower is better (0 = perfect)
WER (Word Error Rate) 0.045 Word-level accuracy
BLEU Score 0.965 Text similarity score (0-1)
ANLS 0.980 Average Normalized Levenshtein Similarity
METEOR 0.975 Semantic similarity score

Accuracy Distribution

Accuracy Range Number of Samples Percentage
≥ 99% 65/202 32.2%
≥ 95% 145/202 71.8%
≥ 90% 185/202 91.6%
≥ 80% 197/202 97.5%
< 80% 5/202 2.5%

Baseline Comparison

Model Character Accuracy CER Training Samples
This Model (A100, 6 epochs) 98.0% 0.020 707
Baseline (P100, 3 epochs) 96.98% 0.030 707
Improvement +1.02% -33% -

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for avishadilhara/sinhala-deepseek-ocr-Qlora

Adapter
(4)
this model

Dataset used to train avishadilhara/sinhala-deepseek-ocr-Qlora

Evaluation results