SpecDox: Faster Whisper Urdu-to-English Model (CTranslate2)

This is the CTranslate2 / Faster Whisper version of the highly optimized SpecDox Whisper Medium model. It performs Automatic Speech Recognition (ASR) and Audio Translation, taking spoken Urdu (اردو) and converting it into written English text.

This repository contains model weights designed for production environments. By utilizing the CTranslate2 engine, this model runs up to 4x faster and uses significantly less VRAM than the standard Hugging Face Transformers implementation, making it ideal for real-time and edge deployment.

🚀 Key Features

Blazing Fast Inference: Powered by CTranslate2 for real-time translation on both CPU and GPU.
Low VRAM Footprint: Fits on consumer-grade GPUs or lightweight cloud instances.
Massive Training Data: Trained on 127 hours of Urdu-to-English speech, expanded to 172 hours via data augmentation.
PEFT / LoRA Optimized: Fine-tuned with LoRA adapters, then merged and converted to CTranslate2 format.

📊 Evaluation & Performance

The table below compares SpecDox models against standard baselines. Converting to Faster Whisper retains near-perfect accuracy parity while vastly improving throughput.

Model	WER% ↓	BLEU ↑	METEOR ↑	BERTScore F1 ↑	Rank
SpecDox-Whisper-Medium (Standard)	36.25	53.30	0.7804	0.9405	#1
SpecDox-faster-medium (Faster Whisper)	36.28	53.24	0.7811	0.9402	#2
Whisper Large-v3	42.88	46.86	0.7105	0.9270	#3
Whisper Medium (Baseline)	45.33	44.16	0.6882	0.9226	#4
SeamlessM4T Medium	72.04	18.84	0.3697	0.8429	#5

Engineering Takeaway: The Faster Whisper version of SpecDox achieves a 6.6% absolute reduction in WER compared to OpenAI's Whisper Large-v3, at a fraction of the computational cost.

💻 Usage

This model uses the CTranslate2 engine. Use the faster-whisper library instead of transformers.

1. Install

pip install faster-whisper

2. Run Inference

from faster_whisper import WhisperModel

# Load model from HuggingFace Hub or a local path
model_path = "Shzaib/SpecDox-Faster-Whisper"

# GPU with FP16 — use device="cpu" and compute_type="int8" if no GPU available
model = WhisperModel(model_path, device="cuda", compute_type="float16")

# Translate Urdu audio to English
audio_file = "path/to/your/urdu_audio.wav"

# task="translate" → English output | language="ur" → skip language detection
segments, info = model.transcribe(audio_file, task="translate", language="ur")

print(f"Detected language '{info.language}' with probability {info.language_probability:.2f}")
print("--- Translation ---")

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

📄 License

This model is released under the Apache 2.0 License.

Downloads last month: 45

Model tree for Shzaib/SpecDox-Faster-Whisper

Base model

openai/whisper-medium

Finetuned

(879)

this model