MALIBA-ASR-v1: Revolutionizing Bambara Speech Recognition
MALIBA-ASR-v1 represents a breakthrough in African language technology, setting a new for Bambara speech recognition. Developed by MALIBA-AI, this model significantly outperforms all existing open-source solutions for Bambara ASR, bringing unprecedented quality speech technology to Mali's most widely spoken language.
Bridging the Digital Language Divide
Despite being spoken by over 22 million people, Bambara has remained severely underrepresented in speech technology. MALIBA-ASR-v1 directly addresses this critical gap, achieving performance levels that make digital voice interfaces accessible to Bambara speakers. This work represents a crucial step toward digital language equality and demonstrates that high-quality speech technology is possible for African languages.
Performance Metrics
MALIBA-ASR-v1 achieves breakthrough results on the oza75/bambara-asr benchmark: Here's the metrics table showing only the WER and CER values for your model:
Metric | Value |
---|---|
WER | 0.22 |
CER | 0.10 |
Exceptional Code-Switching Capabilities
One of the most significant advantages of MALIBA-ASR-v1 is its capability of code-switching โ the natural mixing of Bambara with French or other languages that characterizes everyday speech in Mali. MALIBA-ASR-v1 accurately transcribes multi-lingual content, making it practical for real-world applications.
Transforming Access to Technology in Mali
MALIBA-ASR-v1 enables numerous applications previously unavailable to Bambara speakers:
- Healthcare: Voice interfaces for medical information and services
- Education: Audio-based learning tools for literacy and education
- News & Media: Automated transcription of Bambara broadcasts and podcasts
- Preservation: Documentation of oral histories and traditional knowledge
- Accessibility: Voice technologies for visually impaired Bambara speakers
- Mobile Access: Voice commands for smartphone users with limited literacy
Training Details
Dataset and Evaluation
The model was trained on the [coming soon] dataset, representing diverse speakers, dialects, and recording conditions.
Training Procedure
- Base Model: openai/whisper-large-v2
- Adaptation Method: LoRA (PEFT)
- Training Duration: 6 epochs
- Batch Size: 128 (32 per device with gradient accumulation steps of 4)
- Learning Rate: 0.001 with linear scheduler and 50 warmup steps
- Mixed Precision: Native AMP
- Optimizer: AdamW (betas=(0.9, 0.999), epsilon=1e-08)
Training Results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.3265 | 1.0 | 531 | 0.4117 |
0.2711 | 2.0 | 1062 | 0.3612 |
0.223 | 3.0 | 1593 | 0.3397 |
0.1802 | 4.0 | 2124 | 0.3330 |
0.1268 | 5.0 | 2655 | 0.3339 |
0.0932 | 6.0 | 3186 | 0.3491 |
Usage Examples
COMING SOON
The MALIBA-AI Impact
MALIBA-ASR-v1 is part of MALIBA-AI's broader mission to ensure "No Malian Language Left Behind." This initiative is actively transforming Mali's digital landscape by:
- Breaking Language Barriers: Providing technology in languages that Malians actually speak
- Enabling Local Innovation: Allowing Malian developers to build voice-based applications
- Preserving Cultural Heritage: Digitizing and preserving Mali's rich oral traditions
- Democratizing AI: Making cutting-edge technology accessible to all Malians regardless of literacy level
- Building Local Expertise: Training Malian AI practitioners and researchers
Future Development
MALIBA-AI is committed to continuing this work with:
- Extension to other Malian languages (Songhoy, Pular, Tamasheq, etc.)
Join Our Mission
MALIBA-ASR-v1 embodies our commitment to open science and the advancement of African language technologies. We believe that by making cutting-edge speech recognition models freely available, we can accelerate NLP development across Africa.
Join our mission to democratize AI technology:
- Open Science: Use and build upon our research - all code, models, and documentation are open source
- Data Contribution: Share your Bambara speech datasets to help improve model performance
- Research Collaboration: Integrate our models into your research projects and share your findings
- Application Development: Build tools that serve Malian communities using our models
- Educational Impact: Use our models in educational settings to train the next generation of African AI researchers
License
This model is released under the Apache 2.0 license to encourage research, commercial use, and innovation in African language technologies while ensuring proper attribution and patent protection.
Citation
@misc{maliba-asr-v1,
author = {MALIBA-AI},
title = {MALIBA-ASR-v1: Bambara Automatic Speech Recognition},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/MALIBA-AI/maliba-asr-v1}}
}
Acknowledgements
- We thank OpenAI for the Whisper model that served as our foundation
- We acknowledge jeli-asr contributor, [cpmming soon] providing the Bambara ASR dataset
- We appreciate the support of the Bambara-speaking community in Mali
MALIBA-AI: Empowering Mali's Future Through Community-Driven AI Innovation
"No Malian Language Left Behind"
- Downloads last month
- 37
Model tree for sudoping01/maliba-asr-v1
Base model
openai/whisper-large-v2Evaluation results
- WER on oza75/bambara-asrtest set self-reported0.226
- CER on oza75/bambara-asrtest set self-reported0.109