Instructions to use michaelodafe/whisper-pidgin-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use michaelodafe/whisper-pidgin-v1 with PEFT:
from peft import PeftModel from transformers import AutoModelForSeq2SeqLM base_model = AutoModelForSeq2SeqLM.from_pretrained("openai/whisper-large-v3-turbo") model = PeftModel.from_pretrained(base_model, "michaelodafe/whisper-pidgin-v1") - Notebooks
- Google Colab
- Kaggle
Whisper Pidgin v1
LoRA adapter on openai/whisper-large-v3-turbo
fine-tuned for Nigerian Pidgin English (Naija, pcm) automatic
speech recognition.
Trained on ~8.6 hours of curated Pidgin audio with a single LoRA fine-tune on a free Kaggle T4 GPU. Achieves 21.37% WER / 9.90% CER on the held-out test set — an 8.2 pp absolute (28% relative) improvement over the strongest published Pidgin ASR baseline on the same data.
The adapter is 26 MB; combined with the 809M-parameter base model it
runs in real-time on a laptop CPU via faster-whisper.
- 🎤 Try it: HF Space demo
- 💻 Source + reproduce: https://github.com/michaelodafe/Naija-Pidgin-Whisper
- 📦 Dataset used: michaelodafe/pidgin-asr-combined
- 📖 Full design notes: documentation.md in the source repo
Results
| Model | Test WER | Test CER |
|---|---|---|
| Whisper Pidgin v1 (this model) | 21.37% | 9.90% |
| Wav2Vec2-XLSR-53 (published baseline on same test set) | 29.6% | — |
Test split: 893 clips, ~1.78 hours, from
asr-nigerian-pidgin/nigerian-pidgin-1.0.
How to use
Quick start — transformers + peft
import torch
from peft import PeftModel
from transformers import WhisperForConditionalGeneration, WhisperProcessor
BASE = "openai/whisper-large-v3-turbo"
ADAPTER = "michaelodafe/whisper-pidgin-v1"
processor = WhisperProcessor.from_pretrained(BASE, language="english", task="transcribe")
base = WhisperForConditionalGeneration.from_pretrained(BASE, torch_dtype=torch.float16)
model = PeftModel.from_pretrained(base, ADAPTER).merge_and_unload().to("cuda")
model.generation_config.language = "english"
model.generation_config.task = "transcribe"
model.generation_config.forced_decoder_ids = None
model.generation_config.suppress_tokens = []
# audio: a 16kHz mono numpy array
inputs = processor(audio, sampling_rate=16000, return_tensors="pt").to("cuda")
out = model.generate(inputs.input_features, max_length=225)
print(processor.batch_decode(out, skip_special_tokens=True)[0])
Production path — faster-whisper (4–6× faster)
The source repo includes a one-shot script that merges the adapter into
the base, exports to CTranslate2 int8_float16, and runs streaming
inference via faster-whisper:
git clone https://github.com/michaelodafe/Naija-Pidgin-Whisper.git
cd Naija-Pidgin-Whisper
pip install -r requirements.txt
HF_HUB_DISABLE_XET=1 python infer/01_merge_and_convert.py # ~5 min, one-time
python infer/02_streaming_demo.py # live mic → transcript
Training data
| Source | Clips | Hours | Notes |
|---|---|---|---|
asr-nigerian-pidgin/nigerian-pidgin-1.0 |
4,277 | ~8.6 h | 10 native speakers, 16 kHz, CC-BY-4.0 |
Rexe/nigerian-pidgin-speech |
73 | ~0.05 h | Eval-only; single YouTube source |
Combined and re-published as
michaelodafe/pidgin-asr-combined
with a unified schema:
- train: 2,708 clips · 5.41 h
- validation: 677 clips · 1.37 h
- test: 893 clips · 1.78 h
Training procedure
- Base model:
openai/whisper-large-v3-turbo(809M params) - Method: LoRA fine-tune (PEFT)
- LoRA config:
r=32, alpha=64, target_modules=["q_proj","v_proj"], dropout=0.05 - Trainable parameters: ~3M
- Effective batch size: 16 (4 per device × 4 grad-accum)
- Optimizer: AdamW, learning rate
1e-4, warmup ratio 0.05 - Epochs: 5 (845 steps total)
- Mixed precision: fp16
- Hardware: Kaggle free tier, 1× NVIDIA T4 (16 GB VRAM)
- Training time: ~3 h 47 min
Validation trajectory:
| Step | Train loss | Val loss | Val WER | Val CER |
|---|---|---|---|---|
| 200 | 2.91 | 0.81 | 25.97% | 12.72% |
| 400 | 2.25 | 0.73 | 23.25% | 11.49% |
| 600 | 1.94 | 0.71 | 22.39% | 11.23% |
| 800 | 2.09 | 0.70 | 21.96% | 11.02% |
Test WER (21.37%) was slightly better than validation WER (21.96%), indicating clean generalization with no overfitting.
Limitations and bias
- Domain: Trained on read-style news Pidgin (BBC News Pidgin register, single-speaker recordings). Casual conversational Pidgin, shouting, music backgrounds, and group conversation will all show higher error rates.
- Orthography: The model normalizes some Pidgin orthographic
variants (
hin↔him,kain↔kind,neva↔never). This is partially a label-inconsistency artifact in the source dataset itself; future versions could use orthography-aware metrics. - Code-switching: Pidgin↔Standard English mid-utterance is handled, but heavy code-switching with Yoruba / Igbo / Hausa was not in training and is likely to fail.
- 30-second window: Audio longer than 30 seconds is silently
truncated by Whisper's input encoder. For longer-form audio, segment
with VAD first (see
infer/02_streaming_demo.pyin the source repo). - Speaker coverage: Training data has 10 speakers, all aged 20–28, recorded in a single accent register. Older speakers or different regional accents may underperform.
- Number/format: The model sometimes outputs
60 000where the reference is60000, etc. A simple postprocess pass (ininfer/decode.py) fixes most of these.
Decode-time enhancements (Path A)
The source repo ships a decode.py helper that adds two zero-cost
enhancements:
initial_prompthotwords — a Pidgin-style context sentence listing common Nigerian proper nouns and Pidgin function words. Biases the decoder toward correct vocabulary, especially proper nouns.- Postprocess — strips punctuation the labels don't use, merges digit groups, drops in-number commas.
These together provide an additional ~1–2 pp WER improvement on top of the base model's 21.37%, at zero inference cost. Already enabled in the streaming demo and the HF Inference Endpoint handler.
License and attribution
- Code & adapter weights: MIT
- Base model (
openai/whisper-large-v3-turbo): MIT (Whisper) - Training data (
asr-nigerian-pidgin/nigerian-pidgin-1.0): CC-BY-4.0 — attribution to the original dataset authors is required for any downstream use.
If you use this model in research or a product, please credit:
- The dataset:
asr-nigerian-pidgin/nigerian-pidgin-1.0 - OpenAI's Whisper paper (Radford et al., 2022).
- This model card / repo.
Citation
@misc{odafe2026pidginwhisper,
title = {Whisper Pidgin v1: Nigerian Pidgin English Speech-to-Text},
author = {Odafe, Michael},
year = {2026},
url = {https://huggingface.co/michaelodafe/whisper-pidgin-v1},
note = {LoRA fine-tune of openai/whisper-large-v3-turbo}
}
Acknowledgments
- The
asr-nigerian-pidgin/nigerian-pidgin-1.0dataset team for releasing the only sizeable open Pidgin ASR corpus. - OpenAI for Whisper.
- HuggingFace for hosting and the
transformers/datasets/peftlibraries. - SYSTRAN for
faster-whisperand CTranslate2. - The Silero team for VAD.
- Kaggle for free GPU compute.
- Downloads last month
- 50
Model tree for michaelodafe/whisper-pidgin-v1
Datasets used to train michaelodafe/whisper-pidgin-v1
michaelodafe/pidgin-asr-combined
Space using michaelodafe/whisper-pidgin-v1 1
Evaluation results
- Test WER on asr-nigerian-pidgin/nigerian-pidgin-1.0test set self-reported0.214
- Test CER on asr-nigerian-pidgin/nigerian-pidgin-1.0test set self-reported0.099