bofenghuang
/

asr-wav2vec2-xls-r-1b-ctc-french

Automatic Speech Recognition

hf-asr-leaderboard

robust-speech-event

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

asr-wav2vec2-xls-r-1b-ctc-french / README.md

bofenghuang's picture

Initial commit

c71204d almost 2 years ago

|

3.33 kB

	---
	license: apache-2.0
	language: fr
	library_name: transformers
	thumbnail: null
	tags:
	- automatic-speech-recognition
	- hf-asr-leaderboard
	- robust-speech-event
	- CTC
	- Wav2vec2
	datasets:
	- common_voice
	- mozilla-foundation/common_voice_11_0
	- facebook/multilingual_librispeech
	- polinaeterna/voxpopuli
	- gigant/african_accented_french
	metrics:
	- wer
	model-index:
	- name: Fine-tuned Wav2Vec2 XLS-R 1B model for ASR in French
	results:
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: Common Voice 11.0
	type: mozilla-foundation/common_voice_11_0
	args: fr
	metrics:
	- name: Test WER
	type: wer
	value: 14.80
	- name: Test WER (+LM)
	type: wer
	value: 12.61
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: Multilingual LibriSpeech (MLS)
	type: facebook/multilingual_librispeech
	args: french
	metrics:
	- name: Test WER
	type: wer
	value: 9.39
	- name: Test WER (+LM)
	type: wer
	value: 8.06
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: VoxPopuli
	type: polinaeterna/voxpopuli
	args: fr
	metrics:
	- name: Test WER
	type: wer
	value: 11.80
	- name: Test WER (+LM)
	type: wer
	value: 9.94
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: African Accented French
	type: gigant/african_accented_french
	args: fr
	metrics:
	- name: Test WER
	type: wer
	value: 22.98
	- name: Test WER (+LM)
	type: wer
	value: 20.73
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: Robust Speech Event - Dev Data
	type: speech-recognition-community-v2/dev_data
	args: fr
	metrics:
	- name: Test WER
	type: wer
	value: 17.88
	- name: Test WER (+LM)
	type: wer
	value: 14.01
	---

	# Fine-tuned Wav2Vec2 XLS-R 1B model for ASR in French

	<style>
	img {
	display: inline;
	}
	</style>

	![Model architecture](https://img.shields.io/badge/Model_Architecture-Wav2Vec2--CTC-lightgrey)
	![Model size](https://img.shields.io/badge/Params-962M-lightgrey)
	![Language](https://img.shields.io/badge/Language-French-lightgrey)

	This model is a fine-tuned version of [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) on French using the train and validation splits of [Common Voice 11.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0), [Multilingual LibriSpeech](https://huggingface.co/datasets/facebook/multilingual_librispeech), [Voxpopuli](https://github.com/facebookresearch/voxpopuli), [Multilingual TEDx](http://www.openslr.org/100), [MediaSpeech](https://www.openslr.org/108), and [African Accented French](https://huggingface.co/datasets/gigant/african_accented_french) on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.

	Genrally we advise to use [bofenghuang/asr-wav2vec2-ctc-french](https://huggingface.co/bofenghuang/asr-wav2vec2-ctc-french) because it has the smaller model size and the better performance.