MALIBA-ASR-v1: Revolutionizing Bambara Speech Recognition

MALIBA-ASR-v1 represents a breakthrough in African language technology, setting a new for Bambara speech recognition. Developed by MALIBA-AI, this model significantly outperforms all existing open-source solutions for Bambara ASR, bringing unprecedented quality speech technology to Mali's most widely spoken language.

Bridging the Digital Language Divide

Despite being spoken by over 22 million people, Bambara has remained severely underrepresented in speech technology. MALIBA-ASR-v1 directly addresses this critical gap, achieving performance levels that make digital voice interfaces accessible to Bambara speakers. This work represents a crucial step toward digital language equality and demonstrates that high-quality speech technology is possible for African languages.

Performance Metrics

MALIBA-ASR-v1 achieves breakthrough results on the oza75/bambara-asr benchmark: Here's the metrics table showing only the WER and CER values for your model:

Metric	Value
WER	0.22
CER	0.10

Exceptional Code-Switching Capabilities

One of the most significant advantages of MALIBA-ASR-v1 is its capability of code-switching – the natural mixing of Bambara with French or other languages that characterizes everyday speech in Mali. MALIBA-ASR-v1 accurately transcribes multi-lingual content, making it practical for real-world applications.

Transforming Access to Technology in Mali

MALIBA-ASR-v1 enables numerous applications previously unavailable to Bambara speakers:

Healthcare: Voice interfaces for medical information and services
Education: Audio-based learning tools for literacy and education
News & Media: Automated transcription of Bambara broadcasts and podcasts
Preservation: Documentation of oral histories and traditional knowledge
Accessibility: Voice technologies for visually impaired Bambara speakers
Mobile Access: Voice commands for smartphone users with limited literacy

Training Details

Dataset and Evaluation

The model was trained on the [coming soon] dataset, representing diverse speakers, dialects, and recording conditions.

Training Procedure

Base Model: openai/whisper-large-v2
Adaptation Method: LoRA (PEFT)
Training Duration: 6 epochs
Batch Size: 128 (32 per device with gradient accumulation steps of 4)
Learning Rate: 0.001 with linear scheduler and 50 warmup steps
Mixed Precision: Native AMP
Optimizer: AdamW (betas=(0.9, 0.999), epsilon=1e-08)

Training Results

Training Loss	Epoch	Step	Validation Loss
0.3265	1.0	531	0.4117
0.2711	2.0	1062	0.3612
0.223	3.0	1593	0.3397
0.1802	4.0	2124	0.3330
0.1268	5.0	2655	0.3339
0.0932	6.0	3186	0.3491

Usage Examples

    COMING SOON

The MALIBA-AI Impact

MALIBA-ASR-v1 is part of MALIBA-AI's broader mission to ensure "No Malian Language Left Behind." This initiative is actively transforming Mali's digital landscape by:

Breaking Language Barriers: Providing technology in languages that Malians actually speak
Enabling Local Innovation: Allowing Malian developers to build voice-based applications
Preserving Cultural Heritage: Digitizing and preserving Mali's rich oral traditions
Democratizing AI: Making cutting-edge technology accessible to all Malians regardless of literacy level
Building Local Expertise: Training Malian AI practitioners and researchers

Future Development

MALIBA-AI is committed to continuing this work with:

Extension to other Malian languages (Songhoy, Pular, Tamasheq, etc.)

Join Our Mission

MALIBA-ASR-v1 embodies our commitment to open science and the advancement of African language technologies. We believe that by making cutting-edge speech recognition models freely available, we can accelerate NLP development across Africa.

Join our mission to democratize AI technology:

Open Science: Use and build upon our research - all code, models, and documentation are open source
Data Contribution: Share your Bambara speech datasets to help improve model performance
Research Collaboration: Integrate our models into your research projects and share your findings
Application Development: Build tools that serve Malian communities using our models
Educational Impact: Use our models in educational settings to train the next generation of African AI researchers

License

This model is released under the Apache 2.0 license to encourage research, commercial use, and innovation in African language technologies while ensuring proper attribution and patent protection.

Citation

@misc{maliba-asr-v1,
  author = {MALIBA-AI},
  title = {MALIBA-ASR-v1:  Bambara Automatic Speech Recognition},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/MALIBA-AI/maliba-asr-v1}}
}

Acknowledgements

We thank OpenAI for the Whisper model that served as our foundation
We acknowledge jeli-asr contributor, [cpmming soon] providing the Bambara ASR dataset
We appreciate the support of the Bambara-speaking community in Mali

MALIBA-AI: Empowering Mali's Future Through Community-Driven AI Innovation

"No Malian Language Left Behind"

sudoping01
/

maliba-asr-v1

You need to agree to share your contact information to access this model