Whisper Pidgin v1

LoRA adapter on openai/whisper-large-v3-turbo fine-tuned for Nigerian Pidgin English (Naija, pcm) automatic speech recognition.

Trained on ~8.6 hours of curated Pidgin audio with a single LoRA fine-tune on a free Kaggle T4 GPU. Achieves 21.37% WER / 9.90% CER on the held-out test set — an 8.2 pp absolute (28% relative) improvement over the strongest published Pidgin ASR baseline on the same data.

The adapter is 26 MB; combined with the 809M-parameter base model it runs in real-time on a laptop CPU via faster-whisper.

🎤 Try it: HF Space demo
💻 Source + reproduce: https://github.com/michaelodafe/Naija-Pidgin-Whisper
📦 Dataset used: michaelodafe/pidgin-asr-combined
📖 Full design notes: documentation.md in the source repo

Results

Model	Test WER	Test CER
Whisper Pidgin v1 (this model)	21.37%	9.90%
Wav2Vec2-XLSR-53 (published baseline on same test set)	29.6%	—

Test split: 893 clips, ~1.78 hours, from asr-nigerian-pidgin/nigerian-pidgin-1.0.

How to use

Quick start — `transformers` + `peft`

import torch
from peft import PeftModel
from transformers import WhisperForConditionalGeneration, WhisperProcessor

BASE = "openai/whisper-large-v3-turbo"
ADAPTER = "michaelodafe/whisper-pidgin-v1"

processor = WhisperProcessor.from_pretrained(BASE, language="english", task="transcribe")
base = WhisperForConditionalGeneration.from_pretrained(BASE, torch_dtype=torch.float16)
model = PeftModel.from_pretrained(base, ADAPTER).merge_and_unload().to("cuda")
model.generation_config.language = "english"
model.generation_config.task = "transcribe"
model.generation_config.forced_decoder_ids = None
model.generation_config.suppress_tokens = []

# audio: a 16kHz mono numpy array
inputs = processor(audio, sampling_rate=16000, return_tensors="pt").to("cuda")
out = model.generate(inputs.input_features, max_length=225)
print(processor.batch_decode(out, skip_special_tokens=True)[0])

Production path — `faster-whisper` (4–6× faster)

The source repo includes a one-shot script that merges the adapter into the base, exports to CTranslate2 int8_float16, and runs streaming inference via faster-whisper:

git clone https://github.com/michaelodafe/Naija-Pidgin-Whisper.git
cd Naija-Pidgin-Whisper
pip install -r requirements.txt
HF_HUB_DISABLE_XET=1 python infer/01_merge_and_convert.py   # ~5 min, one-time
python infer/02_streaming_demo.py                            # live mic → transcript

Training data

Source	Clips	Hours	Notes
`asr-nigerian-pidgin/nigerian-pidgin-1.0`	4,277	~8.6 h	10 native speakers, 16 kHz, CC-BY-4.0
`Rexe/nigerian-pidgin-speech`	73	~0.05 h	Eval-only; single YouTube source

Combined and re-published as michaelodafe/pidgin-asr-combined with a unified schema:

train: 2,708 clips · 5.41 h
validation: 677 clips · 1.37 h
test: 893 clips · 1.78 h

Training procedure

Base model: openai/whisper-large-v3-turbo (809M params)
Method: LoRA fine-tune (PEFT)
LoRA config: r=32, alpha=64, target_modules=["q_proj","v_proj"], dropout=0.05
Trainable parameters: ~3M
Effective batch size: 16 (4 per device × 4 grad-accum)
Optimizer: AdamW, learning rate 1e-4, warmup ratio 0.05
Epochs: 5 (845 steps total)
Mixed precision: fp16
Hardware: Kaggle free tier, 1× NVIDIA T4 (16 GB VRAM)
Training time: ~3 h 47 min

Validation trajectory:

Step	Train loss	Val loss	Val WER	Val CER
200	2.91	0.81	25.97%	12.72%
400	2.25	0.73	23.25%	11.49%
600	1.94	0.71	22.39%	11.23%
800	2.09	0.70	21.96%	11.02%

Test WER (21.37%) was slightly better than validation WER (21.96%), indicating clean generalization with no overfitting.

Limitations and bias

Domain: Trained on read-style news Pidgin (BBC News Pidgin register, single-speaker recordings). Casual conversational Pidgin, shouting, music backgrounds, and group conversation will all show higher error rates.
Orthography: The model normalizes some Pidgin orthographic variants (hin ↔ him, kain ↔ kind, neva ↔ never). This is partially a label-inconsistency artifact in the source dataset itself; future versions could use orthography-aware metrics.
Code-switching: Pidgin↔Standard English mid-utterance is handled, but heavy code-switching with Yoruba / Igbo / Hausa was not in training and is likely to fail.
30-second window: Audio longer than 30 seconds is silently truncated by Whisper's input encoder. For longer-form audio, segment with VAD first (see infer/02_streaming_demo.py in the source repo).
Speaker coverage: Training data has 10 speakers, all aged 20–28, recorded in a single accent register. Older speakers or different regional accents may underperform.
Number/format: The model sometimes outputs 60 000 where the reference is 60000, etc. A simple postprocess pass (in infer/decode.py) fixes most of these.

Decode-time enhancements (Path A)

The source repo ships a decode.py helper that adds two zero-cost enhancements:

initial_prompt hotwords — a Pidgin-style context sentence listing common Nigerian proper nouns and Pidgin function words. Biases the decoder toward correct vocabulary, especially proper nouns.
Postprocess — strips punctuation the labels don't use, merges digit groups, drops in-number commas.

These together provide an additional ~1–2 pp WER improvement on top of the base model's 21.37%, at zero inference cost. Already enabled in the streaming demo and the HF Inference Endpoint handler.

License and attribution

Code & adapter weights: MIT
Base model (openai/whisper-large-v3-turbo): MIT (Whisper)
Training data (asr-nigerian-pidgin/nigerian-pidgin-1.0): CC-BY-4.0 — attribution to the original dataset authors is required for any downstream use.

If you use this model in research or a product, please credit:

The dataset: asr-nigerian-pidgin/nigerian-pidgin-1.0
OpenAI's Whisper paper (Radford et al., 2022).
This model card / repo.

Citation

@misc{odafe2026pidginwhisper,
  title  = {Whisper Pidgin v1: Nigerian Pidgin English Speech-to-Text},
  author = {Odafe, Michael},
  year   = {2026},
  url    = {https://huggingface.co/michaelodafe/whisper-pidgin-v1},
  note   = {LoRA fine-tune of openai/whisper-large-v3-turbo}
}

Acknowledgments

The asr-nigerian-pidgin/nigerian-pidgin-1.0 dataset team for releasing the only sizeable open Pidgin ASR corpus.
OpenAI for Whisper.
HuggingFace for hosting and the transformers / datasets / peft libraries.
SYSTRAN for faster-whisper and CTranslate2.
The Silero team for VAD.
Kaggle for free GPU compute.

Downloads last month: 50

Model tree for michaelodafe/whisper-pidgin-v1

Base model

openai/whisper-large-v3

Finetuned

openai/whisper-large-v3-turbo

Adapter

(120)

this model

Datasets used to train michaelodafe/whisper-pidgin-v1

Space using michaelodafe/whisper-pidgin-v1 1

Evaluation results

Test WER on asr-nigerian-pidgin/nigerian-pidgin-1.0
test set self-reported

0.214
Test CER on asr-nigerian-pidgin/nigerian-pidgin-1.0
test set self-reported

0.099