jbalam-nv commited on
Commit
e3b6854
1 Parent(s): 0263640

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -9
README.md CHANGED
@@ -90,7 +90,7 @@ img {
90
  | [![Riva Compatible](https://img.shields.io/badge/NVIDIA%20Riva-compatible-brightgreen#model-badge)](#deployment-with-nvidia-riva) |
91
 
92
 
93
- This model was trained on a composite dataset (NeMo ASRSET) comprising of over a thousand hours of French speech.
94
  It is a non-autoregressive "large" variant of Conformer, with around 120 million parameters.
95
  See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-ctc) for complete architecture details.
96
  It is also compatible with NVIDIA Riva for [production-grade server deployments](#deployment-with-nvidia-riva).
@@ -127,7 +127,7 @@ asr_model.transcribe(['2086-149220-0033.wav'])
127
 
128
  ```shell
129
  python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py
130
- pretrained_name="nvidia/stt_en_conformer_ctc_large"
131
  audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
132
  ```
133
 
@@ -149,14 +149,14 @@ The NeMo toolkit [3] was used for training the models for over several hundred e
149
 
150
  The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
151
 
152
- The checkpoint of the language model used as the neural rescorer can be found [here]( https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_fr_conformer_ctc_large). You may find more info on how to train and use language models for ASR models here: [ASR Language Modeling](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/asr_language_modeling.html)
153
 
154
  ## Datasets
155
  All the models in this collection are trained on a composite dataset (NeMo ASRSET) comprising of over a thousand hours of French speech:
156
 
157
- - MozillaCommonVoice7.0 356 hours
158
- - MultilingualLibreSpeech 1036 hours
159
- - VoxPopuli 182 hours
160
 
161
  Both models use same dataset, excluding a preprocessing step to strip hyphen from data for secondary model's training.
162
 
@@ -170,7 +170,7 @@ The latest model obtains the following greedy scores on the following evaluation
170
  - 5.88 % on MLS dev
171
  - 4.91 % on MLS test
172
 
173
- With 128 beam search and 4gram KenLM model (included with this model):
174
 
175
  - 7.95 % on MCV7.0 dev
176
  - 9.16 % on MCV7.0 test
@@ -205,5 +205,3 @@ Check out [Riva live demo](https://developer.nvidia.com/riva#demos).
205
 
206
  - [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
207
 
208
-
209
- ---
 
90
  | [![Riva Compatible](https://img.shields.io/badge/NVIDIA%20Riva-compatible-brightgreen#model-badge)](#deployment-with-nvidia-riva) |
91
 
92
 
93
+ This model was trained on a composite dataset comprising of over 1500 hours of French speech.
94
  It is a non-autoregressive "large" variant of Conformer, with around 120 million parameters.
95
  See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-ctc) for complete architecture details.
96
  It is also compatible with NVIDIA Riva for [production-grade server deployments](#deployment-with-nvidia-riva).
 
127
 
128
  ```shell
129
  python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py
130
+ pretrained_name="nvidia/stt_fr_conformer_ctc_large"
131
  audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
132
  ```
133
 
 
149
 
150
  The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
151
 
152
+ The checkpoint of the language model used for rescoring can be found [here]( https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_fr_conformer_ctc_large). You may find more info on how to train and use language models for ASR models here: [ASR Language Modeling](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/asr_language_modeling.html)
153
 
154
  ## Datasets
155
  All the models in this collection are trained on a composite dataset (NeMo ASRSET) comprising of over a thousand hours of French speech:
156
 
157
+ - MozillaCommonVoice 7.0 - 356 hours
158
+ - Multilingual LibriSpeech - 1036 hours
159
+ - VoxPopuli - 182 hours
160
 
161
  Both models use same dataset, excluding a preprocessing step to strip hyphen from data for secondary model's training.
162
 
 
170
  - 5.88 % on MLS dev
171
  - 4.91 % on MLS test
172
 
173
+ With 128 beam search and 4gram KenLM model:
174
 
175
  - 7.95 % on MCV7.0 dev
176
  - 9.16 % on MCV7.0 test
 
205
 
206
  - [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
207