--- language: - uz license: apache-2.0 tags: - automatic-speech-recognition - mozilla-foundation/common_voice_8_0 - generated_from_trainer - robust-speech-event datasets: - mozilla-foundation/common_voice_8_0 model-index: - name: XLS-R-300M Uzbek CV8 results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 8 type: mozilla-foundation/common_voice_8_0 args: uz metrics: - name: Test WER type: wer value: 40.56 - name: Test CER type: cer value: 8.25 --- # XLS-R-300M Uzbek CV8 This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - UG dataset. ## Model description For a description of the model architecture, see [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) The model vocabulary consists of the [Modern Latin alphabet for Uzbek](https://en.wikipedia.org/wiki/Uzbek_alphabet), with punctuation removed. Note that the characters ‘ and ’ do not count as punctuation, as ‘ modifies and , and ’ indicates the glottal stop. ## Intended uses & limitations This model is expected to be of some utility for low-fidelity use cases such as: - Draft video captions - Indexing of recorded broadcasts The model is not reliable enough to use as a substitute for live captions for accessibility purposes, and it should not be used in a manner that would infringe the privacy of any of the contributors to the Common Voice dataset nor any other speakers. ## Training and evaluation data The 30% of the `train` common voice official split was used as training data. The half of the official `dev` split was used as validation data, and the full `test` set was used for final evaluation. ### Framework versions - Transformers 4.17.0.dev0 - Pytorch 1.10.2+cu102 - Datasets 1.18.3 - Tokenizers 0.11.0