Update README.md
Browse files
README.md
CHANGED
@@ -4,13 +4,33 @@ license: apache-2.0
|
|
4 |
base_model: openai/whisper-small
|
5 |
tags:
|
6 |
- generated_from_trainer
|
|
|
|
|
|
|
|
|
7 |
metrics:
|
8 |
- wer
|
|
|
|
|
|
|
|
|
9 |
model-index:
|
10 |
-
- name: whisper-
|
11 |
-
results:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
---
|
13 |
-
|
14 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
15 |
should probably proofread and complete it, then remove this comment. -->
|
16 |
|
@@ -18,24 +38,58 @@ should probably proofread and complete it, then remove this comment. -->
|
|
18 |
|
19 |
This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the None dataset.
|
20 |
It achieves the following results on the evaluation set:
|
21 |
-
- Loss: 1.2149
|
22 |
-
- Model Preparation Time: 0.0065
|
23 |
- Wer: 52.3635
|
24 |
- Cer: 19.0151
|
25 |
|
|
|
26 |
## Model description
|
27 |
|
28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
|
30 |
-
|
31 |
|
32 |
-
|
33 |
|
34 |
-
|
35 |
|
36 |
-
|
|
|
|
|
37 |
|
38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
### Training hyperparameters
|
41 |
|
@@ -51,35 +105,34 @@ The following hyperparameters were used during training:
|
|
51 |
- training_steps: 2000
|
52 |
- mixed_precision_training: Native AMP
|
53 |
|
54 |
-
### Training
|
55 |
-
|
56 |
-
|
|
57 |
-
|
58 |
-
|
|
59 |
-
|
|
60 |
-
|
|
61 |
-
|
|
62 |
-
|
|
63 |
-
|
|
64 |
-
|
|
65 |
-
|
|
66 |
-
|
|
67 |
-
|
|
68 |
-
|
|
69 |
-
|
|
70 |
-
|
|
71 |
-
|
|
72 |
-
|
|
73 |
-
|
|
74 |
-
|
|
75 |
-
|
|
76 |
-
|
|
77 |
-
|
|
78 |
-
|
79 |
|
80 |
### Framework versions
|
81 |
|
82 |
- Transformers 4.44.1
|
83 |
- Pytorch 2.3.1+cu121
|
84 |
- Datasets 2.21.0
|
85 |
-
- Tokenizers 0.19.1
|
|
|
4 |
base_model: openai/whisper-small
|
5 |
tags:
|
6 |
- generated_from_trainer
|
7 |
+
- ASR
|
8 |
+
- Hassaniya
|
9 |
+
- Mauritanian Arabic
|
10 |
+
- Arabic Dialects
|
11 |
metrics:
|
12 |
- wer
|
13 |
+
- cer
|
14 |
+
|
15 |
+
pipeline_tag: automatic-speech-recognition
|
16 |
+
|
17 |
model-index:
|
18 |
+
- name: whisper-samll-hassaniya
|
19 |
+
results:
|
20 |
+
- task:
|
21 |
+
name: Automatic Speech Recognition
|
22 |
+
type: automatic-speech-recognition
|
23 |
+
dataset:
|
24 |
+
name: Hassaniya Audio Dataset
|
25 |
+
type: private
|
26 |
+
metrics:
|
27 |
+
- name: Word Error Rate
|
28 |
+
value: 52.3635
|
29 |
+
type: wer
|
30 |
+
- name: Character Error Rate
|
31 |
+
value: 19.0151
|
32 |
+
type: cer
|
33 |
---
|
|
|
34 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
35 |
should probably proofread and complete it, then remove this comment. -->
|
36 |
|
|
|
38 |
|
39 |
This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the None dataset.
|
40 |
It achieves the following results on the evaluation set:
|
|
|
|
|
41 |
- Wer: 52.3635
|
42 |
- Cer: 19.0151
|
43 |
|
44 |
+
|
45 |
## Model description
|
46 |
|
47 |
+
This model utilizes state-of-the-art techniques in AI and NLP to provide efficient and accurate automatic speech recognition for the Hassanya dialect. This initiative addresses both a technological need and a cultural imperative to preserve a linguistically unique form of Arabic.
|
48 |
+
|
49 |
+
## Intended Uses & Limitations
|
50 |
+
|
51 |
+
This model is intended for use in professional transcription services and linguistic research. It can facilitate the creation of accurate textual representations of Hassanya speech, contributing to digital heritage preservation and linguistic studies. Users should note that performance may vary based on the audio quality and the speaker's accent.
|
52 |
+
|
53 |
+
## Training and Evaluation Data
|
54 |
+
|
55 |
+
The model was trained on a curated dataset of Hassanya audio recordings collected through AudioScribe, an application dedicated to high-quality data collection. The dataset is divided into three subsets with the following total audio lengths:
|
56 |
+
|
57 |
+
- **Training set**: 5 hours 30 minutes
|
58 |
+
- **Testing set**: 4 minutes
|
59 |
+
- **Evaluation set**: 18 minutes
|
60 |
|
61 |
+
This diverse dataset includes various speech samples from native speakers across different age groups and genders to ensure robust model performance.
|
62 |
|
63 |
+
## Model Performance
|
64 |
|
65 |
+
The model has been evaluated at several stages to assess its accuracy and efficiency both before and after training. Below are the key performance metrics:
|
66 |
|
67 |
+
- **Pre-training Evaluation on Eval Set**
|
68 |
+
- WER: 108.7612
|
69 |
+
- CER: 63.9705
|
70 |
|
71 |
+
- **Post-training Evaluation on Eval Set**
|
72 |
+
- WER: 52.3634
|
73 |
+
- CER: 19.0151
|
74 |
+
|
75 |
+
- **Post-training Evaluation on Test Set**
|
76 |
+
- WER: 53.4791
|
77 |
+
- CER: 19.6108
|
78 |
+
|
79 |
+
These results show a significant improvement in all metrics over the course of training, especially in terms of the Word Error Rate and Character Error Rate, demonstrating the model's growing accuracy and efficiency in recognizing Hassanya speech.
|
80 |
+
|
81 |
+
## Training Procedure
|
82 |
+
|
83 |
+
### Resource Capacity During Training
|
84 |
+
|
85 |
+
The training session was conducted on Google Colab Pro, which provided the following resource capacities:
|
86 |
+
|
87 |
+
- **System RAM**: The environment was equipped with 51 GB of system memory, offering ample capacity for handling data processing and model operations during training.
|
88 |
+
- **GPU RAM**: The GPU provided had a capacity of 15 GB of memory, which is well-suited for training large models or handling substantial batch sizes, especially when using advanced techniques like mixed-precision training.
|
89 |
+
- **Disk Space**: A total of 201.2 GB of disk space was available, ensuring sufficient storage for datasets, model checkpoints, and logs throughout the training process.
|
90 |
+
|
91 |
+
### Training Duration
|
92 |
+
The training of the model was completed in 160.32 minutes (approximately 2 hours 40 minutes).
|
93 |
|
94 |
### Training hyperparameters
|
95 |
|
|
|
105 |
- training_steps: 2000
|
106 |
- mixed_precision_training: Native AMP
|
107 |
|
108 |
+
### Training Results
|
109 |
+
|
110 |
+
| Step | Epoch | Training Loss | Validation Loss | Wer | Cer |
|
111 |
+
|:-----:|:-------:|:-------------:|:---------------:|:-------:|:-------:|
|
112 |
+
| 100 | 2.2472 | 0.9126 | 0.8441 | 65.8109 | 25.6766 |
|
113 |
+
| 200 | 4.4944 | 0.202 | 0.9230 | 57.5795 | 21.3726 |
|
114 |
+
| 300 | 6.7416 | 0.0854 | 1.0011 | 58.8020 | 21.1788 |
|
115 |
+
| 400 | 8.9888 | 0.0497 | 1.0513 | 57.2535 | 20.7988 |
|
116 |
+
| 500 | 11.2360 | 0.0358 | 1.0700 | 57.9055 | 21.7216 |
|
117 |
+
| 600 | 13.4831 | 0.026 | 1.0964 | 56.2755 | 20.8918 |
|
118 |
+
| 700 | 15.7303 | 0.0199 | 1.1063 | 55.9495 | 20.2482 |
|
119 |
+
| 800 | 17.9775 | 0.0114 | 1.1709 | 56.0717 | 20.8530 |
|
120 |
+
| 900 | 20.2247 | 0.0084 | 1.1633 | 56.6830 | 20.4653 |
|
121 |
+
| 1000 | 22.4719 | 0.005 | 1.1659 | 54.8900 | 20.4110 |
|
122 |
+
| 1100 | 24.7191 | 0.0026 | 1.1591 | 54.6455 | 20.2714 |
|
123 |
+
| 1200 | 26.9663 | 0.001 | 1.1771 | 54.3602 | 19.6278 |
|
124 |
+
| 1300 | 29.2135 | 0.0005 | 1.1900 | 53.9527 | 19.4959 |
|
125 |
+
| 1400 | 31.4607 | 0.0004 | 1.1971 | 53.6675 | 19.3641 |
|
126 |
+
| 1500 | 33.7079 | 0.0003 | 1.2049 | 52.5672 | 19.0694 |
|
127 |
+
| 1600 | 35.9551 | 0.0003 | 1.2069 | 52.6895 | 19.1082 |
|
128 |
+
| 1700 | 38.2022 | 0.0002 | 1.2107 | 52.6487 | 19.0151 |
|
129 |
+
| 1800 | 40.4494 | 0.0002 | 1.2125 | 52.4450 | 19.0151 |
|
130 |
+
| 1900 | 42.6966 | 0.0002 | 1.2145 | 52.4450 | 19.0539 |
|
131 |
+
| 2000 | 44.9438 | 0.0002 | 1.2149 | 52.3635 | 19.0151 |
|
|
|
132 |
|
133 |
### Framework versions
|
134 |
|
135 |
- Transformers 4.44.1
|
136 |
- Pytorch 2.3.1+cu121
|
137 |
- Datasets 2.21.0
|
138 |
+
- Tokenizers 0.19.1
|