Update README.md
Browse filesuse correct tokenization for wer
README.md
CHANGED
@@ -117,10 +117,11 @@ training_args = TrainingArguments(
|
|
117 |
|
118 |
We benchmark on the test set using WER with words tokenized by [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp) 2.3.1 and CER. We also measure performance when spell correction using [TNC](http://www.arts.chula.ac.th/ling/tnc/) ngrams is applied. Evaluation codes can be found in `notebooks/wav2vec2_finetuning_tutorial.ipynb`
|
119 |
|
120 |
-
|
|
121 |
-
|
122 |
-
| without spell correction | 0.
|
123 |
-
| with spell correction | 0.
|
|
|
124 |
|
125 |
## Ackowledgements
|
126 |
* model training and validation notebooks/scripts [@cstorm125](https://github.com/cstorm125/)
|
|
|
117 |
|
118 |
We benchmark on the test set using WER with words tokenized by [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp) 2.3.1 and CER. We also measure performance when spell correction using [TNC](http://www.arts.chula.ac.th/ling/tnc/) ngrams is applied. Evaluation codes can be found in `notebooks/wav2vec2_finetuning_tutorial.ipynb`
|
119 |
|
120 |
+
| | WER | CER |
|
121 |
+
|-------------------------------|------------|------------|
|
122 |
+
| Ours without spell correction | 0.13634024 | 0.02813019 |
|
123 |
+
| Ours with spell correction | 0.17996397 | 0.05225761 |
|
124 |
+
| Google Web Speech API | 0.13711234 | 0.07357340 |
|
125 |
|
126 |
## Ackowledgements
|
127 |
* model training and validation notebooks/scripts [@cstorm125](https://github.com/cstorm125/)
|