Nassq OCR v3 — Arabic Calligraphy Transcription

Fine-tuned from Gemma 4 E4B for OCR transcription of historical Arabic calligraphy (Naskh, Thuluth, Diwani, Kufic, Muhaqqaq), trained on the HICMA dataset + custom collected samples.

Training approach

LoRA fine-tuning (r=8, alpha=16, dropout=0.05), OCR-only objective (no joint style classification — style classification is handled by a separate model)
Two-phase training: base fine-tune (700 steps, lr=1e-4) followed by a refinement phase (400 steps, lr=5e-5, reduced label smoothing) starting from the best base checkpoint

Test set results (602 held-out images)

Metric	Score
CER	20.65%
WER	48.17%
Levenshtein Ratio	86.22%

Per-style CER

Style	CER	Test samples
Naskh	12.9%	374
Muhaqqaq	14.1%	74
Thuluth	31.6%	101
Kufic	47.4%	29
Diwani	51.6%	24

Known limitation: Kufic and Diwani have substantially higher error rates, primarily due to limited training data (under 250 images each in the full dataset) rather than a model architecture limitation.

Usage

from transformers import AutoModelForImageTextToText, AutoProcessor
from PIL import Image

processor = AutoProcessor.from_pretrained("AyaEhab258/nassq-ocr-v3")
model = AutoModelForImageTextToText.from_pretrained("AyaEhab258/nassq-ocr-v3")

messages = [{"role": "user", "content": [
    {"type": "image", "image": Image.open("calligraphy.jpg")},
    {"type": "text", "text": "Transcribe the Arabic text in this image."},
]}]

Downloads last month: 106

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AyaEhab258/NASAQ4.1

Quantizations

1 model