Whisper Small β€” Igbo

Part of Olu Igbo ("Igbo Voice") β€” an offline, on-device Igbo speech recognition project built for the Arm Create: AI Optimization Challenge 2026. See the full project on GitHub β†’

A LoRA fine-tune of openai/whisper-small for Igbo automatic speech recognition. Igbo isn't one of Whisper's 99 native languages, so this uses the <|yo|> (Yoruba) language token as a proxy during both training and inference.

Results

62.45% WER on the FLEURS Igbo test set (969 samples), down from a 68.95% baseline β€” verified on the full test set, not estimated from training metrics.

Training data

Usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel
import torch

processor = WhisperProcessor.from_pretrained("openai/whisper-small")
base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
model = PeftModel.from_pretrained(base_model, "theelvace/whisper-small-igbo")
model = model.merge_and_unload()

# Igbo language token proxy β€” see note above
forced_decoder_ids = [[1, 50325], [2, 50359], [3, 50363]]

inputs = processor.feature_extractor(audio_array, sampling_rate=16000, return_tensors="pt")
generated_ids = model.generate(inputs.input_features, forced_decoder_ids=forced_decoder_ids)
transcription = processor.tokenizer.decode(generated_ids[0], skip_special_tokens=True)

On-device deployment

This repo also hosts ONNX exports (encoder, cross-attention initializer, KV-cache decoder) used to run this model fully on-device on Android β€” no cloud inference. See the Olu Igbo GitHub repo for the full mobile app, export scripts, and benchmarks on a Snapdragon 678 device.

Limitations

  • 62.45% WER reflects real, measured performance, not a polished demo number β€” short, clear utterances transcribe more reliably than long or complex ones.
  • Performance on live microphone audio in real-world noise conditions will generally be lower than the FLEURS test set figure, which is measured on clean studio recordings.

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for theelvace/whisper-small-igbo

Adapter
(234)
this model

Datasets used to train theelvace/whisper-small-igbo

Evaluation results