|
--- |
|
license: apache-2.0 |
|
language: fr |
|
library_name: transformers |
|
thumbnail: null |
|
tags: |
|
- automatic-speech-recognition |
|
- hf-asr-leaderboard |
|
- robust-speech-event |
|
- CTC |
|
- Wav2vec2 |
|
datasets: |
|
- common_voice |
|
- mozilla-foundation/common_voice_11_0 |
|
- facebook/multilingual_librispeech |
|
- polinaeterna/voxpopuli |
|
- gigant/african_accented_french |
|
metrics: |
|
- wer |
|
model-index: |
|
- name: Fine-tuned Wav2Vec2 XLS-R 1B model for ASR in French |
|
results: |
|
- task: |
|
name: Automatic Speech Recognition |
|
type: automatic-speech-recognition |
|
dataset: |
|
name: Common Voice 11.0 |
|
type: mozilla-foundation/common_voice_11_0 |
|
args: fr |
|
metrics: |
|
- name: Test WER |
|
type: wer |
|
value: 14.80 |
|
- name: Test WER (+LM) |
|
type: wer |
|
value: 12.61 |
|
- task: |
|
name: Automatic Speech Recognition |
|
type: automatic-speech-recognition |
|
dataset: |
|
name: Multilingual LibriSpeech (MLS) |
|
type: facebook/multilingual_librispeech |
|
args: french |
|
metrics: |
|
- name: Test WER |
|
type: wer |
|
value: 9.39 |
|
- name: Test WER (+LM) |
|
type: wer |
|
value: 8.06 |
|
- task: |
|
name: Automatic Speech Recognition |
|
type: automatic-speech-recognition |
|
dataset: |
|
name: VoxPopuli |
|
type: polinaeterna/voxpopuli |
|
args: fr |
|
metrics: |
|
- name: Test WER |
|
type: wer |
|
value: 11.80 |
|
- name: Test WER (+LM) |
|
type: wer |
|
value: 9.94 |
|
- task: |
|
name: Automatic Speech Recognition |
|
type: automatic-speech-recognition |
|
dataset: |
|
name: African Accented French |
|
type: gigant/african_accented_french |
|
args: fr |
|
metrics: |
|
- name: Test WER |
|
type: wer |
|
value: 22.98 |
|
- name: Test WER (+LM) |
|
type: wer |
|
value: 20.73 |
|
- task: |
|
name: Automatic Speech Recognition |
|
type: automatic-speech-recognition |
|
dataset: |
|
name: Robust Speech Event - Dev Data |
|
type: speech-recognition-community-v2/dev_data |
|
args: fr |
|
metrics: |
|
- name: Test WER |
|
type: wer |
|
value: 17.88 |
|
- name: Test WER (+LM) |
|
type: wer |
|
value: 14.01 |
|
--- |
|
|
|
# Fine-tuned Wav2Vec2 XLS-R 1B model for ASR in French |
|
|
|
<style> |
|
img { |
|
display: inline; |
|
} |
|
</style> |
|
|
|
![Model architecture](https://img.shields.io/badge/Model_Architecture-Wav2Vec2--CTC-lightgrey) |
|
![Model size](https://img.shields.io/badge/Params-962M-lightgrey) |
|
![Language](https://img.shields.io/badge/Language-French-lightgrey) |
|
|
|
This model is a fine-tuned version of [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) on French using the train and validation splits of [Common Voice 11.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0), [Multilingual LibriSpeech](https://huggingface.co/datasets/facebook/multilingual_librispeech), [Voxpopuli](https://github.com/facebookresearch/voxpopuli), [Multilingual TEDx](http://www.openslr.org/100), [MediaSpeech](https://www.openslr.org/108), and [African Accented French](https://huggingface.co/datasets/gigant/african_accented_french) on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. |
|
|
|
*Genrally we advise to use [bofenghuang/asr-wav2vec2-ctc-french](https://huggingface.co/bofenghuang/asr-wav2vec2-ctc-french) because it has the smaller model size and the better performance.* |
|
|
|
|