Instructions to use Inferencelab/whisper-small-urdu-int8-ct2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Inferencelab/whisper-small-urdu-int8-ct2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="Inferencelab/whisper-small-urdu-int8-ct2")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Inferencelab/whisper-small-urdu-int8-ct2", dtype="auto") - Notebooks
- Google Colab
- Kaggle
whisper-small-urdu-int8-ct2
INT8 Quantized Version of khawajaaliarshad/whisper-small-urdu
Converted using CTranslate2 for faster inference on CPU/GPU, optimized for mobile and edge deployment.
Original Model
- Name: whisper-small-urdu
- Author: Khawaja Ali Arshad
- License: Apache 2.0
- This repository contains a quantized INT8 version; no retraining was performed.
Conversion Details
- Conversion tool: CTranslate2
- Quantization: INT8
- Purpose: Reduce model size and accelerate inference for low-resource environments (mobile/CPU/GPU).
- Backend support: Faster-Whisper compatible.
Benchmark (T4 GPU, 5-second audio sample)
| Version | Inference Time | Notes |
|---|---|---|
| Original FP32 | 9.07 seconds | Standard Hugging Face PyTorch model |
| INT8 Quantized | 0.54 seconds | Using CTranslate2, 16x speed-up |
- Model size: ~967 MB original → INT8 version smaller (approx 1/4–1/3 the size; check actual folder size)
⚠️ Note: Actual speed may vary depending on device, CPU/GPU, and batch size.
Usage Example
from faster_whisper import WhisperModel
# Load the INT8 quantized model
model = WhisperModel(
"whisper-small-urdu-int8-ct2",
device="cpu", # or "cuda" for GPU
compute_type="int8"
)
# Transcribe an audio file
segments, info = model.transcribe("audio.wav")
print("Detected language:", info.language)
for segment in segments:
print(segment.text)
License & Attribution
This model is released under Apache 2.0 License, same as the original model.
Original model by Khawaja Ali Arshad: Original Model
INT8 quantization done by Muhammad Khubaib Ahmad using CTranslate2.
Please retain proper attribution when using this model.
Recommendations
For mobile/edge: Use INT8 version for faster inference and lower memory usage.
For training/fine-tuning: Use the original FP32 model; quantized INT8 is not suitable for further training.
For benchmarking: Test on your target hardware for accurate latency measurements.
Compatibility: Fully compatible with faster-whisper API.
- Downloads last month
- 18
Model tree for Inferencelab/whisper-small-urdu-int8-ct2
Base model
openai/whisper-small