w11wo commited on
Commit
ffb8ccd
1 Parent(s): 863a5e1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -26
README.md CHANGED
@@ -8,7 +8,7 @@ tags:
8
  datasets:
9
  - common_voice
10
  model-index:
11
- - name: Wav2Vec2 XLS-R 300M Cantonese (zh-HK) LM
12
  results:
13
  - task:
14
  name: Automatic Speech Recognition
@@ -20,7 +20,7 @@ model-index:
20
  metrics:
21
  - name: Test CER
22
  type: cer
23
- value: 12.14
24
  - task:
25
  name: Automatic Speech Recognition
26
  type: automatic-speech-recognition
@@ -31,45 +31,34 @@ model-index:
31
  metrics:
32
  - name: Test CER
33
  type: cer
34
- value: 56.86
35
  ---
36
 
37
- # Wav2Vec2 XLS-R 300M Cantonese (zh-HK) LM
38
 
39
- Wav2Vec2 XLS-R 300M Cantonese (zh-HK) LM is an automatic speech recognition model based on the [XLS-R](https://arxiv.org/abs/2111.09296) architecture. This model is a fine-tuned version of [Wav2Vec2-XLS-R-300M](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the `zh-HK` subset of the [Common Voice](https://huggingface.co/datasets/common_voice) dataset. A 5-gram Language model, trained on multiple [PyCantonese](https://pycantonese.org/data.html) corpora, was then subsequently added to this model.
40
 
41
  This model was trained using HuggingFace's PyTorch framework and is part of the [Robust Speech Challenge Event](https://discuss.huggingface.co/t/open-to-the-community-robust-speech-recognition-challenge/13614) organized by HuggingFace. All training was done on a Tesla V100, sponsored by OVH.
42
 
43
- All necessary scripts used for training could be found in the [Files and versions](https://huggingface.co/w11wo/wav2vec2-xls-r-300m-zh-HK-lm-v2/tree/main) tab, as well as the [Training metrics](https://huggingface.co/w11wo/wav2vec2-xls-r-300m-zh-HK-lm-v2/tensorboard) logged via Tensorboard.
44
-
45
- As for the N-gram language model training, we followed the [blog post tutorial](https://huggingface.co/blog/wav2vec2-with-ngram) provided by HuggingFace.
46
 
47
  ## Model
48
 
49
- | Model | #params | Arch. | Training/Validation data (text) |
50
- | --------------------------------- | ------- | ----- | ------------------------------- |
51
- | `wav2vec2-xls-r-300m-zh-HK-lm-v2` | 300M | XLS-R | `Common Voice zh-HK` Dataset |
52
 
53
  ## Evaluation Results
54
 
55
- The model achieves the following results on evaluation without a language model:
56
-
57
- | Dataset | CER |
58
- | -------------------------------- | ------ |
59
- | `Common Voice` | 31.73% |
60
- | `Robust Speech Event - Dev Data` | 56.60% |
61
 
62
- With the addition of the language model, it achieves the following results:
63
-
64
- | Dataset | CER |
65
- | -------------------------------- | ------ |
66
- | `Common Voice` | 12.14% |
67
- | `Robust Speech Event - Dev Data` | 56.86% |
68
 
69
  ## Training procedure
70
 
71
- The training process did not involve the addition of a language model. The following results were simply lifted from the original automatic speech recognition [model training](https://huggingface.co/w11wo/wav2vec2-xls-r-300m-korean).
72
-
73
  ### Training hyperparameters
74
 
75
  The following hyperparameters were used during training:
@@ -171,7 +160,7 @@ Do consider the biases which came from pre-training datasets that may be carried
171
 
172
  ## Authors
173
 
174
- Wav2Vec2 XLS-R 300M Cantonese (zh-HK) LM was trained and evaluated by [Wilson Wongso](https://w11wo.github.io/). All computation and development are done on OVH Cloud.
175
 
176
  ## Framework versions
177
 
 
8
  datasets:
9
  - common_voice
10
  model-index:
11
+ - name: Wav2Vec2 XLS-R 300M Cantonese (zh-HK)
12
  results:
13
  - task:
14
  name: Automatic Speech Recognition
 
20
  metrics:
21
  - name: Test CER
22
  type: cer
23
+ value: 31.73
24
  - task:
25
  name: Automatic Speech Recognition
26
  type: automatic-speech-recognition
 
31
  metrics:
32
  - name: Test CER
33
  type: cer
34
+ value: 56.60
35
  ---
36
 
37
+ # Wav2Vec2 XLS-R 300M Cantonese (zh-HK)
38
 
39
+ Wav2Vec2 XLS-R 300M Cantonese (zh-HK) is an automatic speech recognition model based on the [XLS-R](https://arxiv.org/abs/2111.09296) architecture. This model is a fine-tuned version of [Wav2Vec2-XLS-R-300M](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the `zh-HK` subset of the [Common Voice](https://huggingface.co/datasets/common_voice) dataset.
40
 
41
  This model was trained using HuggingFace's PyTorch framework and is part of the [Robust Speech Challenge Event](https://discuss.huggingface.co/t/open-to-the-community-robust-speech-recognition-challenge/13614) organized by HuggingFace. All training was done on a Tesla V100, sponsored by OVH.
42
 
43
+ All necessary scripts used for training could be found in the [Files and versions](https://huggingface.co/w11wo/wav2vec2-xls-r-300m-zh-HK-v2/tree/main) tab, as well as the [Training metrics](https://huggingface.co/w11wo/wav2vec2-xls-r-300m-zh-HK-v2/tensorboard) logged via Tensorboard.
 
 
44
 
45
  ## Model
46
 
47
+ | Model | #params | Arch. | Training/Validation data (text) |
48
+ | ------------------------------ | ------- | ----- | ------------------------------- |
49
+ | `wav2vec2-xls-r-300m-zh-HK-v2` | 300M | XLS-R | `Common Voice zh-HK` Dataset |
50
 
51
  ## Evaluation Results
52
 
53
+ The model achieves the following results on evaluation:
 
 
 
 
 
54
 
55
+ | Dataset | Loss | CER |
56
+ | -------------------------------- | ------ | ------ |
57
+ | `Common Voice` | 0.8089 | 31.73% |
58
+ | `Robust Speech Event - Dev Data` | N/A | 56.60% |
 
 
59
 
60
  ## Training procedure
61
 
 
 
62
  ### Training hyperparameters
63
 
64
  The following hyperparameters were used during training:
 
160
 
161
  ## Authors
162
 
163
+ Wav2Vec2 XLS-R 300M Cantonese (zh-HK) was trained and evaluated by [Wilson Wongso](https://w11wo.github.io/). All computation and development are done on OVH Cloud.
164
 
165
  ## Framework versions
166