ssid32
/

wav2vec2-xlsr-dendi-ddn-for-numerals

+---
+license: cc-by-nc-4.0
+language: ddn
+metrics:
+- wer
+tags:
+- text-to-audio
+- automatic-speech-recognition
+- wav2vec2-fine-tuning
+- dendi-text-to-speech
+model-index:
+- name: Dendi Numerals ASR
+  results:
+  - task:
+      name: Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: dendi
+      type: dendi_numbers_dataset
+    metrics:
+       - name: Test WER
+         type: wer
+         value: 18.18
+pipeline_tag: automatic-speech-recognition
+---
+# CreaTiv Team (CTT): Dendi Numerals Automatic Speech Recognition
+This repository contains an Automatic Speech Recognition (ASR) model specifically for recognizing numerals in the Dendi (ddn) language.
+The model can accurately recognize numbers ranging from 0 to 1,000,000,000 when spoken in Dendi.
+This model is part of Creativ Team's [Noulinmon](https://noulinmon.baruwuu.bj/) project, a user-friendly mobile app designed to make calculations accessible in six local languages of Benin, featuring voice reading and AI capabilities.
+You can find more CTT-ASR models on the Hugging Face Hub: [ssid32/ctt-asr](https://huggingface.co/models?sort=trending&search=ssid32).
+CTT-ASR is available in the 🤗 Transformers library from version 4.4 onwards.
+## Model Details
+The model is a fine-tuned version of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Dendi.
+When using this model, make sure that your speech input is sampled at 16kHz.
+## Usage
+To use this model, first install the latest version of 🤗 Transformers library:
+```
+pip install --upgrade transformers accelerate
+```
+Then, run inference with the following code-snippet:
+```python
+import torch
+import torchaudio
+from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
+processor = Wav2Vec2Processor.from_pretrained("ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals")
+model = Wav2Vec2ForCTC.from_pretrained("ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals")
+speech_array, sampling_rate = torchaudio.load("audio_test.wav")
+speech_array = speech_array.squeeze().numpy()
+inputs = processor(speech_array, sampling_rate=16_000, return_tensors="pt", padding=True)
+with torch.no_grad():
+  logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
+  output = processor.batch_decode(torch.argmax(logits, dim=-1))
+print("Output:", output)
+```
+You can listen to the sample audio here:
+<audio controls>
+  <source src="https://huggingface.co/ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals/resolve/main/audio_test.wav" type="audio/wav">
+  Your browser does not support the audio element.
+</audio>
+Upon processing the sample audio, the model produces the following output:
+```
+Output: ['zangu ihaaku nda weiguu']
+```
+### Evaluation result
+The model's performance on a test set yields a Word Error Rate (WER) of **18.18**%.
+## Authors
+This model was developed by:
+- Salim KORA GUERA (HuggingFace Username: [ssid32](https://huggingface.co/ssid32)) | (koravant1@gmail.com)
+- Etienne TOVIMAFA (HuggingFace Username: [MrBendji](https://huggingface.co/MrBendji)) | (abiodouneti@gmail.com)
+## Citation
+```bibtex
+@misc {
+	author       = { {Salim KORA GUERA and Etienne TOVIMAFA} },
+	title        = { wav2vec2-xlsr-dendi-ddn-for-numerals },
+	year         = 2024,
+	url          = { https://huggingface.co/ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals },
+	doi          = { 10.57967/hf/2930 },
+	publisher    = { Hugging Face }
+}
+```
+## License
+The model is licensed as **CC-BY-NC 4.0**.