metadata
license: apache-2.0
language: fr
library_name: transformers
thumbnail: null
tags:
- automatic-speech-recognition
- hf-asr-leaderboard
- robust-speech-event
- CTC
- Wav2vec2
datasets:
- common_voice
- mozilla-foundation/common_voice_11_0
- facebook/multilingual_librispeech
- polinaeterna/voxpopuli
- gigant/african_accented_french
metrics:
- wer
model-index:
- name: Fine-tuned Wav2Vec2 XLS-R 1B model for ASR in French
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Common Voice 11.0
type: mozilla-foundation/common_voice_11_0
args: fr
metrics:
- name: Test WER
type: wer
value: 14.8
- name: Test WER (+LM)
type: wer
value: 12.61
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Multilingual LibriSpeech (MLS)
type: facebook/multilingual_librispeech
args: french
metrics:
- name: Test WER
type: wer
value: 9.39
- name: Test WER (+LM)
type: wer
value: 8.06
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: VoxPopuli
type: polinaeterna/voxpopuli
args: fr
metrics:
- name: Test WER
type: wer
value: 11.8
- name: Test WER (+LM)
type: wer
value: 9.94
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: African Accented French
type: gigant/african_accented_french
args: fr
metrics:
- name: Test WER
type: wer
value: 22.98
- name: Test WER (+LM)
type: wer
value: 20.73
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Robust Speech Event - Dev Data
type: speech-recognition-community-v2/dev_data
args: fr
metrics:
- name: Test WER
type: wer
value: 17.88
- name: Test WER (+LM)
type: wer
value: 14.01
Fine-tuned Wav2Vec2 XLS-R 1B model for ASR in French
This model is a fine-tuned version of facebook/wav2vec2-xls-r-1b on French using the train and validation splits of Common Voice 11.0, Multilingual LibriSpeech, Voxpopuli, Multilingual TEDx, MediaSpeech, and African Accented French on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
Genrally we advise to use bofenghuang/asr-wav2vec2-ctc-french because it has the smaller model size and the better performance.