cstorm125 commited on
Commit
71ded5c
1 Parent(s): fc25f0f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -0
README.md CHANGED
@@ -18,6 +18,22 @@ Finetuning `wav2vec2-large-xlsr-53` on Thai [Common Voice 7.0](https://commonvoi
18
 
19
  We finetune [wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) based on [Fine-tuning Wav2Vec2 for English ASR](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_tuning_Wav2Vec2_for_English_ASR.ipynb) using Thai examples of [Common Voice Corpus 7.0](https://commonvoice.mozilla.org/en/datasets). The notebooks and scripts can be found in [vistec-ai/wav2vec2-large-xlsr-53-th](https://github.com/vistec-ai/wav2vec2-large-xlsr-53-th). The pretrained model and processor can be found at [airesearch/wav2vec2-large-xlsr-53-th](https://huggingface.co/airesearch/wav2vec2-large-xlsr-53-th).
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ## Usage
22
 
23
  ```
18
 
19
  We finetune [wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) based on [Fine-tuning Wav2Vec2 for English ASR](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_tuning_Wav2Vec2_for_English_ASR.ipynb) using Thai examples of [Common Voice Corpus 7.0](https://commonvoice.mozilla.org/en/datasets). The notebooks and scripts can be found in [vistec-ai/wav2vec2-large-xlsr-53-th](https://github.com/vistec-ai/wav2vec2-large-xlsr-53-th). The pretrained model and processor can be found at [airesearch/wav2vec2-large-xlsr-53-th](https://huggingface.co/airesearch/wav2vec2-large-xlsr-53-th).
20
 
21
+ ## `robust-speech-event`
22
+
23
+ Add `syllable_tokenize`, `word_tokenize` ([PyThaiNLP](https://github.com/PyThaiNLP/pythainlp)) and [deepcut](https://github.com/rkcosmos/deepcut) tokenizers to `eval.py` from [robust-speech-event](https://github.com/huggingface/transformers/tree/master/examples/research_projects/robust-speech-event#evaluation)
24
+
25
+ ```
26
+ > python eval.py --model_id ./ --dataset mozilla-foundation/common_voice_7_0 --config th --split test --log_outputs --thai_tokenizer newmm/syllable/deepcut/cer
27
+ ```
28
+
29
+ ### Eval results on Common Voice 7 "test":
30
+
31
+ | | WER PyThaiNLP 2.3.1 | WER deepcut | SER | CER |
32
+ |---------------------------------|---------------------|-------------|---------|---------|
33
+ | Only Tokenization | 0.9524% | 2.5316% | 1.2346% | 0.1623% |
34
+ | Cleaning rules and Tokenization | TBD | TBD | TBD | TBD |
35
+
36
+
37
  ## Usage
38
 
39
  ```