nvidia
/

stt_fr_conformer_transducer_large

Automatic Speech Recognition

hf-asr-leaderboard

Model card Files Files and versions Community

jbalam-nv commited on Jun 30, 2022

Commit

65327ea

•

1 Parent(s): 33bdf78

Update README.md

Files changed (1) hide show

README.md +8 -1

README.md CHANGED Viewed

@@ -148,7 +148,7 @@ Conformer-Transducer model is an autoregressive variant of Conformer model [1] f
 The NeMo toolkit [3] was used for training the models for over several hundred epochs. These model are trained with this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_transducer/speech_to_text_rnnt_bpe.py) and this [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/conformer/conformer_transducer_bpe.yaml).
-The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
 ## Datasets
 All the models in this collection are trained on a composite dataset (NeMo ASRSET) comprising of over a thousand hours of French speech:
@@ -177,5 +177,12 @@ Since this model was trained on publicly available speech datasets, the performa
 Further, since portions of the training set contain text from both pre- and post- 1990 orthographic reform, regularity of punctuation may vary between the two styles.
 For downstream tasks requiring more consistency, finetuning or downstream processing may be required. If exact orthography is not necessary, then using secondary model is advised.

 The NeMo toolkit [3] was used for training the models for over several hundred epochs. These model are trained with this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_transducer/speech_to_text_rnnt_bpe.py) and this [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/conformer/conformer_transducer_bpe.yaml).
+The sentence-piece tokenizers [2] for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
 ## Datasets
 All the models in this collection are trained on a composite dataset (NeMo ASRSET) comprising of over a thousand hours of French speech:
 Further, since portions of the training set contain text from both pre- and post- 1990 orthographic reform, regularity of punctuation may vary between the two styles.
 For downstream tasks requiring more consistency, finetuning or downstream processing may be required. If exact orthography is not necessary, then using secondary model is advised.
+## References
+- [1] [Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/abs/2005.08100)
+- [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
+- [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)