Instructions to use msingiai/sauti-asr with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use msingiai/sauti-asr with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="msingiai/sauti-asr")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("msingiai/sauti-asr") model = AutoModelForSpeechSeq2Seq.from_pretrained("msingiai/sauti-asr") - Notebooks
- Google Colab
- Kaggle
msingiai/sauti-asr
This card describes the current Track A production candidate from the repo. The public-facing training-data summary omits restricted internal sources, but this does not imply a fresh clean-release retrain.
Model Summary
This model is a fine-tuned version of microsoft/paza-whisper-large-v3-turbo for Swahili automatic
speech recognition. It comes from the Sauti ASR Track A pipeline in the
sauti-asr repository.
Intended Release Type
- Release profile:
current-preview - Intended use: Public research and product evaluation
Evaluation Snapshot
The current repo Track A checkpoint was evaluated on 500 held-out
Kenyan Swahili samples.
| Metric | Value |
|---|---|
| WER | 13.72% |
| CER | 3.88% |
| Reference words | 10395 |
Training Data
The release flow in this repository tracks the following dataset mix:
| Dataset | License | Notes |
|---|---|---|
mozilla-common-voice |
Common Voice (CC0) | Used in repo Track A pipeline |
google-fleurs |
FLEURS (CC-BY-4.0) | Used in repo Track A pipeline |
alffa-swahili-news |
ALFFA / OpenSLR (MIT) | Used in repo Track A pipeline |
keystats-swahili-asr-data |
KeyStats (Apache-2.0) | Used in repo Track A pipeline |
Known Limitations
- Performance is weaker on code-switched Swahili/English speech.
- Named entities, abbreviations, and numbers remain difficult.
- Long-form transcription should use chunking instead of a single-pass decode.
- The checkpoint is useful for Swahili ASR evaluation and product prototyping, but the public metadata profile is narrower than the full historical repo training mix.
Usage
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
model_id = "msingiai/sauti-asr"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id,
torch_dtype=torch_dtype,
low_cpu_mem_usage=True,
use_safetensors=True,
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
torch_dtype=torch_dtype,
device=device,
chunk_length_s=25,
)
result = pipe("audio.wav")
print(result["text"])
Source Repository
The training, evaluation, and serving code lives in:
Msingi-AI/sauti-asr
Responsible Use
This model transcribes speech. Users are responsible for obtaining rights and consent for audio they process, especially for clinical, customer-support, or other sensitive recordings.
- Downloads last month
- 154
Model tree for msingiai/sauti-asr
Base model
openai/whisper-large-v3Evaluation results
- Word Error Rate on Kenyan Swahili held-out test setself-reported13.72%
- Character Error Rate on Kenyan Swahili held-out test setself-reported3.88%