cstorm125 commited on
Commit
8b5f968
1 Parent(s): 5d0cede

Update README.md

Browse files

use correct tokenization for wer

Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -117,10 +117,11 @@ training_args = TrainingArguments(
117
 
118
  We benchmark on the test set using WER with words tokenized by [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp) 2.3.1 and CER. We also measure performance when spell correction using [TNC](http://www.arts.chula.ac.th/ling/tnc/) ngrams is applied. Evaluation codes can be found in `notebooks/wav2vec2_finetuning_tutorial.ipynb`
119
 
120
- | | WER | CER |
121
- |--------------------------|------------|------------|
122
- | without spell correction | 0.20755690 | 0.02813019 |
123
- | with spell correction | 0.24592172 | 0.05225761 |
 
124
 
125
  ## Ackowledgements
126
  * model training and validation notebooks/scripts [@cstorm125](https://github.com/cstorm125/)
 
117
 
118
  We benchmark on the test set using WER with words tokenized by [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp) 2.3.1 and CER. We also measure performance when spell correction using [TNC](http://www.arts.chula.ac.th/ling/tnc/) ngrams is applied. Evaluation codes can be found in `notebooks/wav2vec2_finetuning_tutorial.ipynb`
119
 
120
+ | | WER | CER |
121
+ |-------------------------------|------------|------------|
122
+ | Ours without spell correction | 0.13634024 | 0.02813019 |
123
+ | Ours with spell correction | 0.17996397 | 0.05225761 |
124
+ | Google Web Speech API | 0.13711234 | 0.07357340 |
125
 
126
  ## Ackowledgements
127
  * model training and validation notebooks/scripts [@cstorm125](https://github.com/cstorm125/)