bofenghuang commited on
Commit
02a26b0
1 Parent(s): b9072de

updt README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -8
README.md CHANGED
@@ -1,15 +1,14 @@
1
  ---
2
- language:
3
- - fr
4
  license: apache-2.0
 
 
 
5
  tags:
6
  - automatic-speech-recognition
7
  - hf-asr-leaderboard
8
  - robust-speech-event
9
- - mozilla-foundation/common_voice_11_0
10
- - facebook/multilingual_librispeech
11
- - facebook/voxpopuli
12
- - gigant/african_accented_french
13
  datasets:
14
  - common_voice
15
  - mozilla-foundation/common_voice_11_0
@@ -91,12 +90,27 @@ model-index:
91
  - name: Test WER (+LM)
92
  type: wer
93
  value: 12.96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94
  ---
95
 
96
  # Fine-tuned wav2vec2-FR-7K-large model for ASR in French
97
 
98
- This model is a fine-tuned version of [LeBenchmark/wav2vec2-FR-7K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-7K-large) on French using the train and validation splits of [Common Voice 11.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0), [Multilingual LibriSpeech](https://huggingface.co/datasets/facebook/multilingual_librispeech), [Voxpopuli](https://github.com/facebookresearch/voxpopuli), [Multilingual TEDx](http://www.openslr.org/100), [MediaSpeech](https://www.openslr.org/108), and [African Accented French](https://huggingface.co/datasets/gigant/african_accented_french) on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
99
 
 
100
 
101
  ## Usage
102
 
@@ -160,7 +174,6 @@ predicted_ids = torch.argmax(logits, dim=-1)
160
  predicted_sentence = processor.batch_decode(predicted_ids)[0]
161
  ```
162
 
163
-
164
  ## Evaluation
165
 
166
  1. To evaluate on `mozilla-foundation/common_voice_11_0`
 
1
  ---
 
 
2
  license: apache-2.0
3
+ language: fr
4
+ library_name: transformers
5
+ thumbnail: null
6
  tags:
7
  - automatic-speech-recognition
8
  - hf-asr-leaderboard
9
  - robust-speech-event
10
+ - CTC
11
+ - Wav2vec2
 
 
12
  datasets:
13
  - common_voice
14
  - mozilla-foundation/common_voice_11_0
 
90
  - name: Test WER (+LM)
91
  type: wer
92
  value: 12.96
93
+ - task:
94
+ name: Automatic Speech Recognition
95
+ type: automatic-speech-recognition
96
+ dataset:
97
+ name: Fleurs
98
+ type: google/fleurs
99
+ args: fr_fr
100
+ metrics:
101
+ - name: Test WER
102
+ type: wer
103
+ value: 10.10
104
+ - name: Test WER (+LM)
105
+ type: wer
106
+ value: 8.84
107
  ---
108
 
109
  # Fine-tuned wav2vec2-FR-7K-large model for ASR in French
110
 
111
+ ![Model architecture](https://img.shields.io/badge/Model_Architecture-Wav2Vec2--CTC-lightgrey)
112
 
113
+ This model is a fine-tuned version of [LeBenchmark/wav2vec2-FR-7K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-7K-large), trained on a composite dataset comprising of over 2200 hours of French speech audio, using the train and validation splits of [Common Voice 11.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0), [Multilingual LibriSpeech](https://huggingface.co/datasets/facebook/multilingual_librispeech), [Voxpopuli](https://github.com/facebookresearch/voxpopuli), [Multilingual TEDx](http://www.openslr.org/100), [MediaSpeech](https://www.openslr.org/108), and [African Accented French](https://huggingface.co/datasets/gigant/african_accented_french). When using the model make sure that your speech input is also sampled at 16Khz.
114
 
115
  ## Usage
116
 
 
174
  predicted_sentence = processor.batch_decode(predicted_ids)[0]
175
  ```
176
 
 
177
  ## Evaluation
178
 
179
  1. To evaluate on `mozilla-foundation/common_voice_11_0`