avishadilhara/sinhala-ocr-lk-acts-1010
Viewer • Updated • 1.01k • 9
How to use avishadilhara/sinhala-deepseek-ocr-Qlora with Transformers:
# Use a pipeline as a high-level helper
# Warning: Pipeline type "image-to-text" is no longer supported in transformers v5.
# You must load the model directly (see below) or downgrade to v4.x with:
# 'pip install "transformers<5.0.0'
from transformers import pipeline
pipe = pipeline("image-to-text", model="avishadilhara/sinhala-deepseek-ocr-Qlora") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("avishadilhara/sinhala-deepseek-ocr-Qlora", dtype="auto")How to use avishadilhara/sinhala-deepseek-ocr-Qlora with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for avishadilhara/sinhala-deepseek-ocr-Qlora to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for avishadilhara/sinhala-deepseek-ocr-Qlora to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for avishadilhara/sinhala-deepseek-ocr-Qlora to start chatting
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="avishadilhara/sinhala-deepseek-ocr-Qlora",
max_seq_length=2048,
)Fine-tuned DeepSeek-OCR model for high-accuracy Sinhala language OCR on historical legal documents
Quick Start • Performance • Usage • Training • 🎓 Citation
This model is a LoRA fine-tuned version of DeepSeek-OCR specifically optimized for Sinhala (සිංහල) language OCR on historical and contemporary legal documents. The model achieves 98% character accuracy on a test set spanning over a century of Sri Lankan legal texts (1910-2024).
| Property | Value |
|---|---|
| Base Model | unsloth/DeepSeek-OCR |
| Model Type | Vision-Language Model (VLM) |
| Fine-tuning Method | LoRA (Low-Rank Adaptation) |
| Language | Sinhala (සිංහල) |
| License | Apache 2.0 |
| Parameters | ~3.5B (base) + 155M (LoRA trainable) |
| Precision | 4-bit quantized (inference) |
| Metric | Score | Description |
|---|---|---|
| Character Accuracy | 98.0% | Percentage of correctly recognized characters |
| CER (Character Error Rate) | 0.020 | Lower is better (0 = perfect) |
| WER (Word Error Rate) | 0.045 | Word-level accuracy |
| BLEU Score | 0.965 | Text similarity score (0-1) |
| ANLS | 0.980 | Average Normalized Levenshtein Similarity |
| METEOR | 0.975 | Semantic similarity score |
| Accuracy Range | Number of Samples | Percentage |
|---|---|---|
| ≥ 99% | 65/202 | 32.2% |
| ≥ 95% | 145/202 | 71.8% |
| ≥ 90% | 185/202 | 91.6% |
| ≥ 80% | 197/202 | 97.5% |
| < 80% | 5/202 | 2.5% |
| Model | Character Accuracy | CER | Training Samples |
|---|---|---|---|
| This Model (A100, 6 epochs) | 98.0% | 0.020 | 707 |
| Baseline (P100, 3 epochs) | 96.98% | 0.030 | 707 |
| Improvement | +1.02% | -33% | - |