Tanel commited on
Commit
a1a327b
1 Parent(s): 7c9add2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -93
README.md CHANGED
@@ -1,93 +1,93 @@
1
- ---
2
- license: cc-by-4.0
3
- tags:
4
- - audio
5
- - automatic-speech-recognition
6
- - hf-asr-leaderboard
7
- language: et
8
- model-index:
9
- - name: xls-r-300m-et
10
- results:
11
- - task:
12
- name: Automatic Speech Recognition
13
- type: automatic-speech-recognition
14
- dataset:
15
- name: Common Voice
16
- type: common_voice
17
- args: et
18
- metrics:
19
- - name: Test WER
20
- type: wer
21
- value: 0.12520395591222402
22
- - name: Test CER
23
- type: cer
24
- value: 0.027091152438624897
25
- - task:
26
- name: Automatic Speech Recognition
27
- type: automatic-speech-recognition
28
- dataset:
29
- name: Common Voice 8
30
- type: mozilla-foundation/common_voice_8_0
31
- args: et
32
- metrics:
33
- - name: Test WER
34
- type: wer
35
- value: 0.1338447882323104
36
- - name: Test CER
37
- type: cer
38
- value: 0.029816686199500255
39
- ---
40
-
41
-
42
- # XLS-R-300m-ET
43
-
44
- This is a XLS-R-300M model [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) finetuned on around 800 hours of diverse Estonian data.
45
-
46
- ## Model description
47
- This is a general-purpose Estonian ASR model trained in the Lab of Language Technology at TalTech. It consists of only the CTC-based end-to-end model, no language model is currently provided.
48
-
49
- ## Intended uses & limitations
50
-
51
- This model is intended for general-purpose speech recognition, such as broadcast conversations, interviews, talks, etc.
52
-
53
- ## How to use
54
-
55
-
56
- TODO
57
-
58
- #### Limitations and bias
59
-
60
- Since this model was trained on mostly broadcast speech and texts from the web, it might have problems correctly decoding the following:
61
- * Speech containing technical and other domain-specific terms
62
- * Children's speech
63
- * Non-native speech
64
- * Speech recorded under very noisy conditions or with a microphone far from the speaker
65
- * Very spontaneous and overlapping speech
66
-
67
- ## Training data
68
- Acoustic training data:
69
-
70
- | Type | Amount (h) |
71
- |-----------------------|:------:|
72
- | Broadcast speech | 591 |
73
- | Spontaneous speech | 53 |
74
- | Elderly speech corpus | 53 |
75
- | Talks, lectures | 49 |
76
- | Parliament speeches | 31 |
77
- | *Total* | *761* |
78
-
79
-
80
- ## Training procedure
81
-
82
- Finetuned using Fairseq.
83
-
84
- ## Evaluation results
85
-
86
- ### WER
87
-
88
- |Dataset | WER |
89
- |---|---|
90
- | jutusaated.devset | 7.9 |
91
- | jutusaated.testset | 6.1 |
92
- | Common Voice 6.1 | 12.5 |
93
- | Common Voice 8.0 | 13.4 |
 
1
+ ---
2
+ license: cc-by-4.0
3
+ tags:
4
+ - audio
5
+ - automatic-speech-recognition
6
+ - hf-asr-leaderboard
7
+ language: et
8
+ model-index:
9
+ - name: xls-r-300m-et
10
+ results:
11
+ - task:
12
+ name: Automatic Speech Recognition
13
+ type: automatic-speech-recognition
14
+ dataset:
15
+ name: Common Voice
16
+ type: common_voice
17
+ args: et
18
+ metrics:
19
+ - name: Test WER
20
+ type: wer
21
+ value: 12.520395591222402
22
+ - name: Test CER
23
+ type: cer
24
+ value: 2.7091152438624897
25
+ - task:
26
+ name: Automatic Speech Recognition
27
+ type: automatic-speech-recognition
28
+ dataset:
29
+ name: Common Voice 8
30
+ type: mozilla-foundation/common_voice_8_0
31
+ args: et
32
+ metrics:
33
+ - name: Test WER
34
+ type: wer
35
+ value: 13.38447882323104
36
+ - name: Test CER
37
+ type: cer
38
+ value: 2.9816686199500255
39
+ ---
40
+
41
+
42
+ # XLS-R-300m-ET
43
+
44
+ This is a XLS-R-300M model [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) finetuned on around 800 hours of diverse Estonian data.
45
+
46
+ ## Model description
47
+ This is a general-purpose Estonian ASR model trained in the Lab of Language Technology at TalTech. It consists of only the CTC-based end-to-end model, no language model is currently provided.
48
+
49
+ ## Intended uses & limitations
50
+
51
+ This model is intended for general-purpose speech recognition, such as broadcast conversations, interviews, talks, etc.
52
+
53
+ ## How to use
54
+
55
+
56
+ TODO
57
+
58
+ #### Limitations and bias
59
+
60
+ Since this model was trained on mostly broadcast speech and texts from the web, it might have problems correctly decoding the following:
61
+ * Speech containing technical and other domain-specific terms
62
+ * Children's speech
63
+ * Non-native speech
64
+ * Speech recorded under very noisy conditions or with a microphone far from the speaker
65
+ * Very spontaneous and overlapping speech
66
+
67
+ ## Training data
68
+ Acoustic training data:
69
+
70
+ | Type | Amount (h) |
71
+ |-----------------------|:------:|
72
+ | Broadcast speech | 591 |
73
+ | Spontaneous speech | 53 |
74
+ | Elderly speech corpus | 53 |
75
+ | Talks, lectures | 49 |
76
+ | Parliament speeches | 31 |
77
+ | *Total* | *761* |
78
+
79
+
80
+ ## Training procedure
81
+
82
+ Finetuned using Fairseq.
83
+
84
+ ## Evaluation results
85
+
86
+ ### WER
87
+
88
+ |Dataset | WER |
89
+ |---|---|
90
+ | jutusaated.devset | 7.9 |
91
+ | jutusaated.testset | 6.1 |
92
+ | Common Voice 6.1 | 12.5 |
93
+ | Common Voice 8.0 | 13.4 |