Nassq OCR v3 — Arabic Calligraphy Transcription
Fine-tuned from Gemma 4 E4B for OCR transcription of historical Arabic calligraphy (Naskh, Thuluth, Diwani, Kufic, Muhaqqaq), trained on the HICMA dataset + custom collected samples.
Training approach
- LoRA fine-tuning (r=8, alpha=16, dropout=0.05), OCR-only objective (no joint style classification — style classification is handled by a separate model)
- Two-phase training: base fine-tune (700 steps, lr=1e-4) followed by a refinement phase (400 steps, lr=5e-5, reduced label smoothing) starting from the best base checkpoint
Test set results (602 held-out images)
| Metric | Score |
|---|---|
| CER | 20.65% |
| WER | 48.17% |
| Levenshtein Ratio | 86.22% |
Per-style CER
| Style | CER | Test samples |
|---|---|---|
| Naskh | 12.9% | 374 |
| Muhaqqaq | 14.1% | 74 |
| Thuluth | 31.6% | 101 |
| Kufic | 47.4% | 29 |
| Diwani | 51.6% | 24 |
Known limitation: Kufic and Diwani have substantially higher error rates, primarily due to limited training data (under 250 images each in the full dataset) rather than a model architecture limitation.
Usage
from transformers import AutoModelForImageTextToText, AutoProcessor
from PIL import Image
processor = AutoProcessor.from_pretrained("AyaEhab258/nassq-ocr-v3")
model = AutoModelForImageTextToText.from_pretrained("AyaEhab258/nassq-ocr-v3")
messages = [{"role": "user", "content": [
{"type": "image", "image": Image.open("calligraphy.jpg")},
{"type": "text", "text": "Transcribe the Arabic text in this image."},
]}]
- Downloads last month
- 106
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support