Whisper Small Urdu v2 🎙️

This model is a fine-tuned version of khawajaaliarshad/whisper-small-urdu optimized for Urdu speech-to-text. It was trained as part of a research initiative to improve ASR performance for low-resource linguistic environments.

Model Results

The model demonstrates strong phonetic accuracy, particularly in handling the complex morphology of the Urdu language.

Metric Value
Word Error Rate (WER) 35.44%
Character Error Rate (CER) 12.05%
Final Validation Loss 0.6692

Intended Uses & Limitations

Intended Use

  • Transcription of Urdu voice recordings.
  • Accessibility tools for Urdu speakers.
  • Foundation for downstream Urdu NLP tasks (e.g., sentiment analysis of speech).

Limitations

  • Background Noise: Performance may degrade in noisy environments or with multiple speakers.
  • Dialects: Primarily optimized for standard Urdu; regional accents may vary in accuracy.
  • Dataset Size: Trained on a subset of Common Voice (1,500 samples), so very niche vocabulary might be missed.

Training Procedure

Training Hyperparameters

  • Learning Rate: 5e-06 (Gentle fine-tuning to preserve base weights)
  • Batch Size: 8 (Per device)
  • Effective Batch Size: 32 (via Gradient Accumulation)
  • Steps: 300
  • Mixed Precision: FP16
  • Optimizer: AdamW

Training Progress

Step Training Loss Validation Loss
100 1.6249 1.0378
200 0.2065 0.6495
300 0.0993 0.6692

Note: Training was concluded at 300 steps as the Validation Loss began to plateau, indicating optimal convergence and preventing overfitting.

Framework Versions

  • Transformers: 5.0.0
  • Pytorch: 2.10.0+cu128
  • Datasets: 4.8.3
  • Tokenizers: 0.22.2

Developed by: Hamza Amin
Location: Ghulam Ishaq Khan Institute (GIKI), Pakistan.

Downloads last month
3
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hamza-amin/whisper-small-urdu-v2

Finetuned
(3)
this model

Evaluation results