jbalam-nv commited on
Commit
65327ea
1 Parent(s): 33bdf78

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -1
README.md CHANGED
@@ -148,7 +148,7 @@ Conformer-Transducer model is an autoregressive variant of Conformer model [1] f
148
 
149
  The NeMo toolkit [3] was used for training the models for over several hundred epochs. These model are trained with this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_transducer/speech_to_text_rnnt_bpe.py) and this [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/conformer/conformer_transducer_bpe.yaml).
150
 
151
- The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
152
 
153
  ## Datasets
154
  All the models in this collection are trained on a composite dataset (NeMo ASRSET) comprising of over a thousand hours of French speech:
@@ -177,5 +177,12 @@ Since this model was trained on publicly available speech datasets, the performa
177
  Further, since portions of the training set contain text from both pre- and post- 1990 orthographic reform, regularity of punctuation may vary between the two styles.
178
  For downstream tasks requiring more consistency, finetuning or downstream processing may be required. If exact orthography is not necessary, then using secondary model is advised.
179
 
 
 
 
 
 
 
 
180
 
181
 
148
 
149
  The NeMo toolkit [3] was used for training the models for over several hundred epochs. These model are trained with this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_transducer/speech_to_text_rnnt_bpe.py) and this [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/conformer/conformer_transducer_bpe.yaml).
150
 
151
+ The sentence-piece tokenizers [2] for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
152
 
153
  ## Datasets
154
  All the models in this collection are trained on a composite dataset (NeMo ASRSET) comprising of over a thousand hours of French speech:
177
  Further, since portions of the training set contain text from both pre- and post- 1990 orthographic reform, regularity of punctuation may vary between the two styles.
178
  For downstream tasks requiring more consistency, finetuning or downstream processing may be required. If exact orthography is not necessary, then using secondary model is advised.
179
 
180
+ ## References
181
+
182
+ - [1] [Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/abs/2005.08100)
183
+
184
+ - [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
185
+
186
+ - [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
187
 
188