ssid32 commited on
Commit
96724c4
1 Parent(s): 834336e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +113 -3
README.md CHANGED
@@ -1,3 +1,113 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ language: ddn
4
+ metrics:
5
+ - wer
6
+ tags:
7
+ - text-to-audio
8
+ - automatic-speech-recognition
9
+ - wav2vec2-fine-tuning
10
+ - dendi-text-to-speech
11
+ model-index:
12
+ - name: Dendi Numerals ASR
13
+ results:
14
+ - task:
15
+ name: Speech Recognition
16
+ type: automatic-speech-recognition
17
+ dataset:
18
+ name: dendi
19
+ type: dendi_numbers_dataset
20
+ metrics:
21
+ - name: Test WER
22
+ type: wer
23
+ value: 18.18
24
+ pipeline_tag: automatic-speech-recognition
25
+ ---
26
+
27
+ # CreaTiv Team (CTT): Dendi Numerals Automatic Speech Recognition
28
+
29
+ This repository contains an Automatic Speech Recognition (ASR) model specifically for recognizing numerals in the Dendi (ddn) language.
30
+ The model can accurately recognize numbers ranging from 0 to 1,000,000,000 when spoken in Dendi.
31
+
32
+ This model is part of Creativ Team's [Noulinmon](https://noulinmon.baruwuu.bj/) project, a user-friendly mobile app designed to make calculations accessible in six local languages of Benin, featuring voice reading and AI capabilities.
33
+ You can find more CTT-ASR models on the Hugging Face Hub: [ssid32/ctt-asr](https://huggingface.co/models?sort=trending&search=ssid32).
34
+
35
+ CTT-ASR is available in the 🤗 Transformers library from version 4.4 onwards.
36
+
37
+ ## Model Details
38
+
39
+ The model is a fine-tuned version of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Dendi.
40
+ When using this model, make sure that your speech input is sampled at 16kHz.
41
+
42
+
43
+ ## Usage
44
+
45
+ To use this model, first install the latest version of 🤗 Transformers library:
46
+
47
+ ```
48
+ pip install --upgrade transformers accelerate
49
+ ```
50
+
51
+ Then, run inference with the following code-snippet:
52
+
53
+ ```python
54
+ import torch
55
+ import torchaudio
56
+ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
57
+
58
+ processor = Wav2Vec2Processor.from_pretrained("ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals")
59
+ model = Wav2Vec2ForCTC.from_pretrained("ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals")
60
+
61
+ speech_array, sampling_rate = torchaudio.load("audio_test.wav")
62
+ speech_array = speech_array.squeeze().numpy()
63
+ inputs = processor(speech_array, sampling_rate=16_000, return_tensors="pt", padding=True)
64
+
65
+ with torch.no_grad():
66
+ logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
67
+ output = processor.batch_decode(torch.argmax(logits, dim=-1))
68
+
69
+ print("Output:", output)
70
+
71
+ ```
72
+
73
+
74
+
75
+ You can listen to the sample audio here:
76
+
77
+ <audio controls>
78
+ <source src="https://huggingface.co/ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals/resolve/main/audio_test.wav" type="audio/wav">
79
+ Your browser does not support the audio element.
80
+ </audio>
81
+
82
+ Upon processing the sample audio, the model produces the following output:
83
+
84
+ ```
85
+ Output: ['zangu ihaaku nda weiguu']
86
+ ```
87
+
88
+ ### Evaluation result
89
+
90
+ The model's performance on a test set yields a Word Error Rate (WER) of **18.18**%.
91
+
92
+ ## Authors
93
+
94
+ This model was developed by:
95
+ - Salim KORA GUERA (HuggingFace Username: [ssid32](https://huggingface.co/ssid32)) | (koravant1@gmail.com)
96
+ - Etienne TOVIMAFA (HuggingFace Username: [MrBendji](https://huggingface.co/MrBendji)) | (abiodouneti@gmail.com)
97
+
98
+ ## Citation
99
+
100
+ ```bibtex
101
+ @misc {
102
+ author = { {Salim KORA GUERA and Etienne TOVIMAFA} },
103
+ title = { wav2vec2-xlsr-dendi-ddn-for-numerals },
104
+ year = 2024,
105
+ url = { https://huggingface.co/ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals },
106
+ doi = { 10.57967/hf/2930 },
107
+ publisher = { Hugging Face }
108
+ }
109
+ ```
110
+
111
+ ## License
112
+
113
+ The model is licensed as **CC-BY-NC 4.0**.