|
--- |
|
language: xal |
|
tags: |
|
- speech |
|
- audio |
|
- automatic-speech-recognition |
|
license: apache-2.0 |
|
--- |
|
|
|
## Info |
|
|
|
This Wav2Vec2 model was first pretrained on 500 hours Kalmyk TV recordings and 1000 hours Mongolian speech recognition dataset. After that, the model was finetuned on a 300 hours [Kalmyk synthetic STT dataset](https://github.com/tugstugi/mongolian-nlp#datasets) created by a voice conversion model. |
|
* 50% WER on a private test set created from Kalmyk TV recordnings |
|
* on clean voice recordings, the model should have much lower WER |
|
* voice conversion info |
|
* 300 hours [Kalmyk synthetic STT dataset](https://github.com/tugstugi/mongolian-nlp#datasets) |
|
* The source voice is a Kalmyk female voice TTS |
|
* Target voices are from the VCTK dataset |
|
* example data: https://twitter.com/tugstugi/status/1409111296897912835 |
|
* each WAV has a different text created from Kalmyk books |
|
|