tugstugi
/

wav2vec2-large-xlsr-53-kalmyk

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

tugstugi commited on Jul 25, 2021

Commit

9bbe0c8

·

1 Parent(s): 55ebce0

Update README.md

Files changed (1) hide show

README.md +9 -1

README.md CHANGED Viewed

@@ -9,4 +9,12 @@ license: apache-2.0
 ## Info
-Wav2Vec XLSR finetuned on the Kalmyk Bible.

 ## Info
+This Wav2Vec2 model was first pretrained on 500 hours Kalmyk TV recordings and 1000 hours Mongolian speech recognition dataset. After that, the model was finetuned on a 300 hours [Kalmyk synthetic STT dataset](https://github.com/tugstugi/mongolian-nlp#datasets) created by a voice conversion model.
+* 50% WER on a private test set created from Kalmyk TV recordnings
+* on clean voice recordings, the model should have much lower WER
+* voice conversion info
+  * 300 hours [Kalmyk synthetic STT dataset](https://github.com/tugstugi/mongolian-nlp#datasets)
+  * The source voice is a Kalmyk female voice TTS
+  * Target voices are from the VCTK dataset
+  * example data: https://twitter.com/tugstugi/status/1409111296897912835
+  * each WAV has a different text created from Kalmyk books