SpecDox: Faster Whisper Urdu-to-English Model (CTranslate2)
This is the CTranslate2 / Faster Whisper version of the highly optimized SpecDox Whisper Medium model. It performs Automatic Speech Recognition (ASR) and Audio Translation, taking spoken Urdu (Ψ§Ψ±Ψ―Ω) and converting it into written English text.
This repository contains model weights designed for production environments. By utilizing the CTranslate2 engine, this model runs up to 4x faster and uses significantly less VRAM than the standard Hugging Face Transformers implementation, making it ideal for real-time and edge deployment.
π Key Features
- Blazing Fast Inference: Powered by CTranslate2 for real-time translation on both CPU and GPU.
- Low VRAM Footprint: Fits on consumer-grade GPUs or lightweight cloud instances.
- Massive Training Data: Trained on 127 hours of Urdu-to-English speech, expanded to 172 hours via data augmentation.
- PEFT / LoRA Optimized: Fine-tuned with LoRA adapters, then merged and converted to CTranslate2 format.
π Evaluation & Performance
The table below compares SpecDox models against standard baselines. Converting to Faster Whisper retains near-perfect accuracy parity while vastly improving throughput.
| Model | WER% β | BLEU β | METEOR β | BERTScore F1 β | Rank |
|---|---|---|---|---|---|
| SpecDox-Whisper-Medium (Standard) | 36.25 | 53.30 | 0.7804 | 0.9405 | #1 |
| SpecDox-faster-medium (Faster Whisper) | 36.28 | 53.24 | 0.7811 | 0.9402 | #2 |
| Whisper Large-v3 | 42.88 | 46.86 | 0.7105 | 0.9270 | #3 |
| Whisper Medium (Baseline) | 45.33 | 44.16 | 0.6882 | 0.9226 | #4 |
| SeamlessM4T Medium | 72.04 | 18.84 | 0.3697 | 0.8429 | #5 |
Engineering Takeaway: The Faster Whisper version of SpecDox achieves a 6.6% absolute reduction in WER compared to OpenAI's Whisper Large-v3, at a fraction of the computational cost.
π» Usage
This model uses the CTranslate2 engine. Use the faster-whisper library instead of transformers.
1. Install
pip install faster-whisper
2. Run Inference
from faster_whisper import WhisperModel
# Load model from HuggingFace Hub or a local path
model_path = "Shzaib/SpecDox-Faster-Whisper"
# GPU with FP16 β use device="cpu" and compute_type="int8" if no GPU available
model = WhisperModel(model_path, device="cuda", compute_type="float16")
# Translate Urdu audio to English
audio_file = "path/to/your/urdu_audio.wav"
# task="translate" β English output | language="ur" β skip language detection
segments, info = model.transcribe(audio_file, task="translate", language="ur")
print(f"Detected language '{info.language}' with probability {info.language_probability:.2f}")
print("--- Translation ---")
for segment in segments:
print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
π License
This model is released under the Apache 2.0 License.
- Downloads last month
- 45
Model tree for Shzaib/SpecDox-Faster-Whisper
Base model
openai/whisper-medium