cstorm125 commited on
Commit
27ad5f2
1 Parent(s): a688881

Update README.md

Browse files

add deepcut benchmark

Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -115,16 +115,16 @@ training_args = TrainingArguments(
115
 
116
  ## Evaluation
117
 
118
- We benchmark on the test set using WER with words tokenized by [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp) 2.3.1 and CER. We also measure performance when spell correction using [TNC](http://www.arts.chula.ac.th/ling/tnc/) ngrams is applied. Evaluation codes can be found in `notebooks/wav2vec2_finetuning_tutorial.ipynb`. Benchmark is performed on `test-unique` split.
119
-
120
- | | WER | CER |
121
- |-------------------------------|------------|------------|
122
- | Ours without spell correction | 0.13634024 | **0.02813019** |
123
- | Ours with spell correction | 0.17996397 | 0.05225761 |
124
- | [Google Web Speech API](https://developers.google.com/web/updates/2013/01/Voice-Driven-Web-Apps-Introduction-to-the-Web-Speech-API) | 0.13711234 | 0.07357340 |
125
- | [Microsoft Bing Speech API](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/speech-api/) | **0.12578819** | 0.05016620 |
126
- | [Amazon Transcribe](https://aws.amazon.com/transcribe/) | 0.2186334 | 0.07077562 |
127
- | [NECTEC AI for Thai Partii API](https://aiforthai.in.th/aiplatform/#/speechtotext)※| 0.20105887 | 0.09551027 |
128
 
129
  ※ APIs are not finetuned with Common Voice 7.0 data
130
 
115
 
116
  ## Evaluation
117
 
118
+ We benchmark on the test set using WER with words tokenized by [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp) 2.3.1 and [deepcut](https://github.com/rkcosmos/deepcut), and CER. We also measure performance when spell correction using [TNC](http://www.arts.chula.ac.th/ling/tnc/) ngrams is applied. Evaluation codes can be found in `notebooks/wav2vec2_finetuning_tutorial.ipynb`. Benchmark is performed on `test-unique` split.
119
+
120
+ | | WER PyThaiNLP 2.3.1 | WER deepcut | CER |
121
+ |--------------------------------|---------------------|----------------|----------------|
122
+ | Ours without spell correction | 0.13634024 | **0.08152052** | **0.02813019** |
123
+ | Ours with spell correction | 0.17996397 | 0.14167975 | 0.05225761 |
124
+ | Google Web Speech API※ | 0.13711234 | 0.10860058 | 0.07357340 |
125
+ | Microsoft Bing Speech API※ | **0.12578819** | 0.09620991 | 0.05016620 |
126
+ | Amazon Transcribe※ | 0.2186334 | 0.14487553 | 0.07077562 |
127
+ | NECTEC AI for Thai Partii API | 0.20105887 | 0.15515631 | 0.09551027 |
128
 
129
  ※ APIs are not finetuned with Common Voice 7.0 data
130