xls-r-300m-et / README.md
Tanel's picture
Update README.md
a1a327b
---
license: cc-by-4.0
tags:
- audio
- automatic-speech-recognition
- hf-asr-leaderboard
language: et
model-index:
- name: xls-r-300m-et
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Common Voice
type: common_voice
args: et
metrics:
- name: Test WER
type: wer
value: 12.520395591222402
- name: Test CER
type: cer
value: 2.7091152438624897
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Common Voice 8
type: mozilla-foundation/common_voice_8_0
args: et
metrics:
- name: Test WER
type: wer
value: 13.38447882323104
- name: Test CER
type: cer
value: 2.9816686199500255
---
# XLS-R-300m-ET
This is a XLS-R-300M model [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) finetuned on around 800 hours of diverse Estonian data.
## Model description
This is a general-purpose Estonian ASR model trained in the Lab of Language Technology at TalTech. It consists of only the CTC-based end-to-end model, no language model is currently provided.
## Intended uses & limitations
This model is intended for general-purpose speech recognition, such as broadcast conversations, interviews, talks, etc.
## How to use
TODO
#### Limitations and bias
Since this model was trained on mostly broadcast speech and texts from the web, it might have problems correctly decoding the following:
* Speech containing technical and other domain-specific terms
* Children's speech
* Non-native speech
* Speech recorded under very noisy conditions or with a microphone far from the speaker
* Very spontaneous and overlapping speech
## Training data
Acoustic training data:
| Type | Amount (h) |
|-----------------------|:------:|
| Broadcast speech | 591 |
| Spontaneous speech | 53 |
| Elderly speech corpus | 53 |
| Talks, lectures | 49 |
| Parliament speeches | 31 |
| *Total* | *761* |
## Training procedure
Finetuned using Fairseq.
## Evaluation results
### WER
|Dataset | WER |
|---|---|
| jutusaated.devset | 7.9 |
| jutusaated.testset | 6.1 |
| Common Voice 6.1 | 12.5 |
| Common Voice 8.0 | 13.4 |