wannaphong commited on
Commit
5a50236
1 Parent(s): c6caaf4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -0
README.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - th
4
+ tags:
5
+ - automatic-speech-recognition
6
+ license: apache-2.0
7
+ datasets:
8
+ - common_voice
9
+ metrics:
10
+ - wer
11
+ - cer
12
+ ---
13
+
14
+ # Thai CommonVoice V8 (newmm tokenizer)
15
+
16
+ This model trained with CommonVoice V8 dataset by increase data from CommonVoice V7 dataset that It was use in [airesearch/wav2vec2-large-xlsr-53-th](https://huggingface.co/airesearch/wav2vec2-large-xlsr-53-th). It was finetune [wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53).
17
+
18
+ ## Datasets
19
+
20
+ It is increase new data from The Common Voice V8 dataset to Common Voice V7 dataset or remove all data in Common Voice V7 dataset before split Common Voice V8 then add CommonVoice V7 dataset back to dataset.
21
+
22
+ ## Models
23
+
24
+ This model was finetune [wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) model with Thai Common Voice V8 dataset and It use pre-tokenize with pythainlp.tokenize.word_tokenize.
25
+
26
+
27
+ **Links:**
28
+ - GitHub Dataset: [https://github.com/wannaphong/thai_commonvoice_dataset](https://github.com/wannaphong/thai_commonvoice_dataset)
29
+
30
+ [WIP]