Nassq OCR v3 — Arabic Calligraphy Transcription

Fine-tuned from Gemma 4 E4B for OCR transcription of historical Arabic calligraphy (Naskh, Thuluth, Diwani, Kufic, Muhaqqaq), trained on the HICMA dataset + custom collected samples.

Training approach

  • LoRA fine-tuning (r=8, alpha=16, dropout=0.05), OCR-only objective (no joint style classification — style classification is handled by a separate model)
  • Two-phase training: base fine-tune (700 steps, lr=1e-4) followed by a refinement phase (400 steps, lr=5e-5, reduced label smoothing) starting from the best base checkpoint

Test set results (602 held-out images)

Metric Score
CER 20.65%
WER 48.17%
Levenshtein Ratio 86.22%

Per-style CER

Style CER Test samples
Naskh 12.9% 374
Muhaqqaq 14.1% 74
Thuluth 31.6% 101
Kufic 47.4% 29
Diwani 51.6% 24

Known limitation: Kufic and Diwani have substantially higher error rates, primarily due to limited training data (under 250 images each in the full dataset) rather than a model architecture limitation.

Usage

from transformers import AutoModelForImageTextToText, AutoProcessor
from PIL import Image

processor = AutoProcessor.from_pretrained("AyaEhab258/nassq-ocr-v3")
model = AutoModelForImageTextToText.from_pretrained("AyaEhab258/nassq-ocr-v3")

messages = [{"role": "user", "content": [
    {"type": "image", "image": Image.open("calligraphy.jpg")},
    {"type": "text", "text": "Transcribe the Arabic text in this image."},
]}]
Downloads last month
106
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AyaEhab258/NASAQ4.1

Quantizations
1 model