wannaphong commited on
Commit
28339c5
1 Parent(s): 9fd958f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -2
README.md CHANGED
@@ -15,6 +15,8 @@ metrics:
15
 
16
  This model trained with CommonVoice V8 dataset by increase data from CommonVoice V7 dataset that It was use in [airesearch/wav2vec2-large-xlsr-53-th](https://huggingface.co/airesearch/wav2vec2-large-xlsr-53-th). It was finetune [wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53).
17
 
 
 
18
  ## Datasets
19
 
20
  It is increase new data from The Common Voice V8 dataset to Common Voice V7 dataset or remove all data in Common Voice V7 dataset before split Common Voice V8 then add CommonVoice V7 dataset back to dataset.
@@ -25,8 +27,28 @@ It use [ekapolc/Thai_commonvoice_split](https://github.com/ekapolc/Thai_commonvo
25
 
26
  This model was finetune [wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) model with Thai Common Voice V8 dataset and It use pre-tokenize with deepcut.tokenize.
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  **Links:**
29
  - GitHub Dataset: [https://github.com/wannaphong/thai_commonvoice_dataset](https://github.com/wannaphong/thai_commonvoice_dataset)
30
  - Deepcut: [https://github.com/rkcosmos/deepcut](https://github.com/rkcosmos/deepcut)
31
-
32
- [WIP]
 
15
 
16
  This model trained with CommonVoice V8 dataset by increase data from CommonVoice V7 dataset that It was use in [airesearch/wav2vec2-large-xlsr-53-th](https://huggingface.co/airesearch/wav2vec2-large-xlsr-53-th). It was finetune [wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53).
17
 
18
+ GitHub: [https://github.com/wannaphong/th-cv-v8-wav2vev2-deepcut](https://github.com/wannaphong/th-cv-v8-wav2vev2-deepcut)
19
+
20
  ## Datasets
21
 
22
  It is increase new data from The Common Voice V8 dataset to Common Voice V7 dataset or remove all data in Common Voice V7 dataset before split Common Voice V8 then add CommonVoice V7 dataset back to dataset.
 
27
 
28
  This model was finetune [wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) model with Thai Common Voice V8 dataset and It use pre-tokenize with deepcut.tokenize.
29
 
30
+ ## Evaluation
31
+
32
+ **Test with CommonVoice V8 Testset**
33
+
34
+ | Model | WER by newmm (%) | WER by deepcut (%) | CER | URL |
35
+ |-----------------------|------------------|--------------------|----------|-------------------------------------------------------------|
36
+ | wav2vec2 with deepcut | 16.354521 | 11.424476 | 3.684060 | https://github.com/wannaphong/th-cv-v8-wav2vev2-deepcut |
37
+ | wav2vec2 with newmm | 16.698299 | 11.436941 | 3.737407 | https://github.com/wannaphong/thai-wav2vec2-cv-v8 |
38
+ | CV v7 | 17.414503 | 11.923089 | 3.854153 | https://huggingface.co/airesearch/wav2vec2-large-xlsr-53-th |
39
+
40
+ **Test with CommonVoice V7 Testset (same test by CV V7)**
41
+
42
+ | Model | WER by newmm (%) | WER by deepcut (%) | CER | URL |
43
+ |-----------------------|------------------|--------------------|----------|-------------------------------------------------------------|
44
+ | wav2vec2 with deepcut | 12.776381 | 8.773006 | 2.628882 | https://github.com/wannaphong/th-cv-v8-wav2vev2-deepcut |
45
+ | wav2vec2 with newmm | 12.750596 | 8.672616 | 2.623341 | https://github.com/wannaphong/thai-wav2vec2-cv-v8 |
46
+ | CV v7 | 13.936698 | 2.804787 | 2.804787 | https://huggingface.co/airesearch/wav2vec2-large-xlsr-53-th |
47
+
48
+ This is use same testset from [https://huggingface.co/airesearch/wav2vec2-large-xlsr-53-th](https://huggingface.co/airesearch/wav2vec2-large-xlsr-53-th).
49
+
50
+ source code benchmark: https://github.com/wannaphong/thai-asr-benchmark/tree/main/commonvoice
51
+
52
  **Links:**
53
  - GitHub Dataset: [https://github.com/wannaphong/thai_commonvoice_dataset](https://github.com/wannaphong/thai_commonvoice_dataset)
54
  - Deepcut: [https://github.com/rkcosmos/deepcut](https://github.com/rkcosmos/deepcut)