MMS speech recognition for Ugandan languages
This is a fine-tuned version of facebook/mms-1b-all for Ugandan languages, trained with the SALT dataset. The languages supported are:
code | language |
---|---|
lug | Luganda |
ach | Acholi |
lgg | Lugbara |
teo | Ateso |
nyn | Runyankole |
eng | English (Ugandan) |
For each language there are two adapters: one optimised for cases where the speech is only in that language, and another in which code-switching with English is expected.
Usage
Usage is the same as the base model, though with different adapters available.
import torch
import transformers
import datasets
# Available adapters:
# ['lug', 'lug+eng', 'ach', 'ach+eng', 'lgg', 'lgg+eng',
# 'nyn', 'nyn+eng', 'teo', 'teo+eng']
language = 'lug'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = transformers.Wav2Vec2ForCTC.from_pretrained(
'Sunbird/asr-mms-salt').to(device)
model.load_adapter(language)
processor = transformers.Wav2Vec2Processor.from_pretrained(
'Sunbird/asr-mms-salt')
processor.tokenizer.set_target_lang(language)
# Get some test audio
ds = datasets.load_dataset('Sunbird/salt', 'multispeaker-lug', split='test')
audio = ds[0]['audio']
sample_rate = ds[0]['sample_rate']
# Apply the model
inputs = processor(audio, sampling_rate=sample_rate, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs.to(device)).logits
ids = torch.argmax(outputs, dim=-1)[0]
transcription = processor.decode(ids)
print(transcription)
# ekikola ky'akasooli kyakyenvu wabula langi yakyo etera okuba eyaakitaka wansi
The output of this model is unpunctuated and lower case. For applications requiring formatted text, an alternative model is Sunbird/asr-whisper-large-v2-salt.
- Downloads last month
- 18,598
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for Sunbird/asr-mms-salt
Base model
facebook/mms-1b-all