|
--- |
|
license: cc-by-4.0 |
|
tags: |
|
- audio |
|
- automatic-speech-recognition |
|
- hf-asr-leaderboard |
|
language: et |
|
model-index: |
|
- name: xls-r-300m-et |
|
results: |
|
- task: |
|
name: Automatic Speech Recognition |
|
type: automatic-speech-recognition |
|
dataset: |
|
name: Common Voice |
|
type: common_voice |
|
args: et |
|
metrics: |
|
- name: Test WER |
|
type: wer |
|
value: 12.520395591222402 |
|
- name: Test CER |
|
type: cer |
|
value: 2.7091152438624897 |
|
- task: |
|
name: Automatic Speech Recognition |
|
type: automatic-speech-recognition |
|
dataset: |
|
name: Common Voice 8 |
|
type: mozilla-foundation/common_voice_8_0 |
|
args: et |
|
metrics: |
|
- name: Test WER |
|
type: wer |
|
value: 13.38447882323104 |
|
- name: Test CER |
|
type: cer |
|
value: 2.9816686199500255 |
|
--- |
|
|
|
|
|
# XLS-R-300m-ET |
|
|
|
This is a XLS-R-300M model [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) finetuned on around 800 hours of diverse Estonian data. |
|
|
|
## Model description |
|
This is a general-purpose Estonian ASR model trained in the Lab of Language Technology at TalTech. It consists of only the CTC-based end-to-end model, no language model is currently provided. |
|
|
|
## Intended uses & limitations |
|
|
|
This model is intended for general-purpose speech recognition, such as broadcast conversations, interviews, talks, etc. |
|
|
|
## How to use |
|
|
|
|
|
TODO |
|
|
|
#### Limitations and bias |
|
|
|
Since this model was trained on mostly broadcast speech and texts from the web, it might have problems correctly decoding the following: |
|
* Speech containing technical and other domain-specific terms |
|
* Children's speech |
|
* Non-native speech |
|
* Speech recorded under very noisy conditions or with a microphone far from the speaker |
|
* Very spontaneous and overlapping speech |
|
|
|
## Training data |
|
Acoustic training data: |
|
|
|
| Type | Amount (h) | |
|
|-----------------------|:------:| |
|
| Broadcast speech | 591 | |
|
| Spontaneous speech | 53 | |
|
| Elderly speech corpus | 53 | |
|
| Talks, lectures | 49 | |
|
| Parliament speeches | 31 | |
|
| *Total* | *761* | |
|
|
|
|
|
## Training procedure |
|
|
|
Finetuned using Fairseq. |
|
|
|
## Evaluation results |
|
|
|
### WER |
|
|
|
|Dataset | WER | |
|
|---|---| |
|
| jutusaated.devset | 7.9 | |
|
| jutusaated.testset | 6.1 | |
|
| Common Voice 6.1 | 12.5 | |
|
| Common Voice 8.0 | 13.4 | |
|
|