huseinzol05 commited on
Commit
9204f13
1 Parent(s): 0b9b0fb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -19
README.md CHANGED
@@ -11,37 +11,61 @@ probably proofread and complete it, then remove this comment. -->
11
 
12
  # wav2vec2-xls-r-300m-mixed
13
 
14
- This model was trained from scratch on an unknown dataset.
15
- It achieves the following results on the evaluation set:
16
 
 
17
 
18
- ## Model description
 
 
19
 
20
- More information needed
21
 
22
- ## Intended uses & limitations
23
 
24
- More information needed
25
 
26
- ## Training and evaluation data
 
 
 
27
 
28
- More information needed
29
 
30
- ## Training procedure
31
 
32
- ### Training hyperparameters
 
 
 
 
 
33
 
34
- The following hyperparameters were used during training:
35
- - optimizer: None
36
- - training_precision: float32
37
 
38
- ### Training results
 
 
 
 
 
39
 
 
40
 
 
 
 
 
 
 
41
 
42
- ### Framework versions
43
 
44
- - Transformers 4.18.0
45
- - TensorFlow 2.6.0
46
- - Datasets 2.1.0
47
- - Tokenizers 0.12.1
 
 
 
 
 
11
 
12
  # wav2vec2-xls-r-300m-mixed
13
 
14
+ Finetuned https://huggingface.co/facebook/wav2vec2-xls-r-300m on https://github.com/huseinzol05/malaya-speech/tree/master/data/mixed-stt
 
15
 
16
+ This model was finetuned on 3 languages,
17
 
18
+ 1. Malay
19
+ 2. Singlish
20
+ 3. Mandarin
21
 
22
+ **This model trained on a single RTX 3090 Ti 24GB VRAM, provided by https://mesolitica.com/**.
23
 
24
+ ## Evaluation set
25
 
26
+ Evaluation set from https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/prepare-stt with sizes,
27
 
28
+ ```
29
+ len(malay), len(singlish), len(mandarin)
30
+ -> (765, 3579, 614)
31
+ ```
32
 
33
+ It achieves the following results on the evaluation set based on [evaluate-wav2vec2-xls-r-300m-mixed.ipynb](evaluate-wav2vec2-xls-r-300m-mixed.ipynb):
34
 
35
+ Mixed evaluation,
36
 
37
+ ```
38
+ CER: 0.04363189219453221
39
+ WER: 0.12446419219809059
40
+ CER with LM: 0.03621180629932558
41
+ WER with LM: 0.09152993800218129
42
+ ```
43
 
44
+ Malay evaluation,
 
 
45
 
46
+ ```
47
+ CER: 0.053659683623049854
48
+ WER: 0.22565751242221832
49
+ CER with LM: 0.036930421149001316
50
+ WER with LM: 0.14256712242006359
51
+ ```
52
 
53
+ Singlish evaluation,
54
 
55
+ ```
56
+ CER: 0.04174804195104746
57
+ WER: 0.10734402150682842
58
+ CER with LM: 0.03538238462620066
59
+ WER with LM: 0.08103191123663189
60
+ ```
61
 
62
+ Mandarin evaluation,
63
 
64
+ ```
65
+ CER: 0.04211892733885779
66
+ WER: 0.09817787449869257
67
+ CER with LM: 0.040151154521006656
68
+ WER with LM: 0.08913415903511501
69
+ ```
70
+
71
+ Language model from https://huggingface.co/huseinzol05/language-model-bahasa-manglish-combined