Whisper large for Ugandan languages

This model is an adaptation of whisper-large-v2 for the following languages widely spoken in Uganda: Luganda, Acholi, Lugbara, Ateso, Runyankole and English (Ugandan accent).

Training

The model was trained with the SALT dataset, Common Voice (Luganda) and FLEURS datasets. To help with generalisation in practical settings, training used addition of random noise and random downsampling to 8kHz to simulate phone speech.

Usage

The model is used in a similar way to the base Whisper model. The model will attempt to auto-detect the language and provide a transcription. However, note that language detection is not always accurate and results may be improved by specifying it instead. The languages in this model are not supported by the base Whisper model, so the format is slightly different:

import transformers
import datasets
import torch

processor = transformers.WhisperProcessor.from_pretrained(
    "Sunbird/asr-whisper-large-v2-salt")
model = transformers.WhisperForConditionalGeneration.from_pretrained(
    "Sunbird/asr-whisper-large-v2-salt")

SALT_LANGUAGE_TOKENS_WHISPER = {
    'eng': 50259,  # English (Ugandan)
    'ach': 50357,  # Acholi
    'lgg': 50356,  # Lugbara
    'lug': 50355,  # Luganda
    'nyn': 50354,  # Runyankole
    'teo': 50353,  # Ateso
}

# Get some test audio
ds = datasets.load_dataset('Sunbird/salt', 'multispeaker-lug', split='test')
audio = ds[0]['audio']
sample_rate = ds[0]['sample_rate']

# Specify a language from one of the above.
lang = 'lug'

# Apply the model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
input_features = processor(
    audio, sampling_rate=sample_rate, return_tensors="pt").input_features
input_features = input_features.to(device)
predicted_ids = model.to(device).generate(
    input_features,
    # Optionally set language=None here instead to auto-detect.
    language=processor.tokenizer.decode(SALT_LANGUAGE_TOKENS_WHISPER[lang]),
    forced_decoder_ids=None)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

print(transcription)
# Ekikoola kya kasooli kya kyenvu wabula langi yaakyo etera okuba eya kitaka wansi.

Performance Metrics

Lang CER WER
eng 0.005 0.013
lug 0.020 0.095
ach 0.059 0.242
lgg 0.059 0.227
teo 0.069 0.256
nyn 0.079 0.316
xog 0.100 0.461
myx 0.119 0.475
swa 0.183 0.249
kin 0.216 0.474
mean 0.091 0.281
Downloads last month
320
Safetensors
Model size
1.54B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Sunbird/asr-whisper-large-v2-salt

Finetuned
(202)
this model

Dataset used to train Sunbird/asr-whisper-large-v2-salt