|
--- |
|
language: |
|
- it |
|
license: apache-2.0 |
|
datasets: |
|
- mozilla-foundation/common_voice_11_0 |
|
metrics: |
|
- wer |
|
- cer |
|
tags: |
|
- audio |
|
- automatic-speech-recognition |
|
- hf-asr-leaderboard |
|
- it |
|
- mozilla-foundation/common_voice_11_0 |
|
- speech |
|
- wav2vec2 |
|
model-index: |
|
- name: XLS-R Wav2Vec2 CV11Ita by radiogroup crits |
|
results: |
|
- task: |
|
name: Speech Recognition |
|
type: automatic-speech-recognition |
|
dataset: |
|
name: Common Voice 11.0 italian |
|
type: mozilla-foundation/common_voice_11_0 |
|
args: it |
|
metrics: |
|
- name: Test WER |
|
type: wer |
|
value: 7.12 |
|
- name: Test CER |
|
type: cer |
|
value: 1.75 |
|
- name: Test WER (+LM) |
|
type: wer |
|
value: 5.77 |
|
- name: Test CER (+LM) |
|
type: cer |
|
value: 1.51 |
|
--- |
|
# XLS-R-1B-CV11ITA-LMWIKI500 |
|
|
|
## Fine-tuned XLS-R 1B model for speech recognition in Italian |
|
|
|
Fine-tuned [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) on Italian using the train and validation splits of [Common Voice 11.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0). |
|
|
|
When using this model, make sure that your speech input is sampled at 16kHz. |
|
|
|
|
|
## Language model information |
|
|
|
Our language model was generated using a 500-characters data set for each Italian Wikipedia article. |
|
|
|
|
|
## Download CommonVoice11.0 dataset for italian language |
|
```python |
|
from datasets import load_dataset |
|
|
|
dataset = load_dataset("mozilla-foundation/common_voice_11_0", "it", use_auth_token=True) |
|
``` |
|
|
|
## Evaluation Commands |
|
|
|
To evaluate on `mozilla-foundation/common_voice_11_0` with split `test`: |
|
|
|
```bash |
|
python eval.py --model_id radiogroup-crits/wav2vec2-xls-r-1b-cv11ita-lmwiki500 --dataset mozilla-foundation/common_voice_11_0 --config it --split test --log_outputs --greedy |
|
|
|
mv log_mozilla-foundation_common_voice_11_0_it_test_predictions.txt log_mozilla-foundation_common_voice_11_0_it_test_predictions_greedy.txt |
|
|
|
mv log_mozilla-foundation_common_voice_11_0_it_test_targets.txt log_mozilla-foundation_common_voice_11_0_it_test_targets_greedy.txt |
|
|
|
mv mozilla-foundation_common_voice_11_0_it_test_eval_results.txt mozilla-foundation_common_voice_11_0_it_test_eval_results_greedy.txt |
|
|
|
python eval.py --model_id radiogroup-crits/wav2vec2-xls-r-1b-cv11ita-lmwiki500 --dataset mozilla-foundation/common_voice_11_0 --config it --split test --log_outputs |
|
|
|
mv log_mozilla-foundation_common_voice_11_0_it_test_predictions.txt log_mozilla-foundation_common_voice_11_0_it_test_predictions_lm.txt |
|
|
|
mv log_mozilla-foundation_common_voice_11_0_it_test_targets.txt log_mozilla-foundation_common_voice_11_0_it_test_targets_lm.txt |
|
|
|
mv mozilla-foundation_common_voice_11_0_it_test_eval_results.txt mozilla-foundation_common_voice_11_0_it_test_eval_results_lm.txt |
|
``` |
|
|
|
## Citation |
|
If you want to cite this model you can use this: |
|
|
|
```bibtex |
|
@misc{crits2023wav2vec2-xls-r-1b-cv11ita-lmwiki500, |
|
title={XLS-R Wav2Vec2 CV11Ita by radiogroup crits}, |
|
author={Teraoni Prioletti Raffaele, Casagranda Paolo and Russo Francesco}, |
|
publisher={Hugging Face}, |
|
journal={Hugging Face Hub}, |
|
howpublished={\url{https://huggingface.co/radiogroup-crits/wav2vec2-xls-r-1b-cv11ita-lmwiki500}}, |
|
year={2023} |
|
} |
|
``` |