nvidia
/

stt_en_conformer_transducer_xlarge

Automatic Speech Recognition NeMo PyTorch English speech audio Transducer Conformer Transformer NeMo hf-asr-leaderboard Eval Results

Model card Files Files and versions Community

eharper commited on Jun 21, 2022

Commit

574765b

•

1 Parent(s): a7e0aba

Update README.md

Browse files

Files changed (1) hide show

README.md +12 -10

README.md CHANGED Viewed

@@ -191,17 +191,7 @@ This model accepts 16000 KHz Mono-channel Audio (wav files) as input.
 This model provides transcribed speech as a string for a given audio sample.
-## NVIDIA Riva: Deployment
-If you like this and other models from NVIDIA (i.e., CTC-based Conformers) check out [NVIDIA Riva](https://developer.nvidia.com/riva), an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded. This model, as well as other RNNT-based models are currently not supported by Riva. You can find the list of models supported by Riva [here](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/reference/models/index.html).
-Additionally, Riva provides:
-* World-class out-of-the-box accuracy for the most common languages with model checkpoints trained on proprietary data with hundreds of thousands of GPU-compute hours
-* Best in class accuracy via customization with run-time word boosting (e.g., brand and product names), acoustic model training, language model training, and inverse text normalization customizations
-* Streaming speech recognition, Kubernetes compatible scaling, and Enterprise-grade support
-Check out [Riva live demo](https://developer.nvidia.com/riva#demos).
 ## Model Architecture
@@ -242,6 +232,18 @@ The list of the available models in this collection is shown in the following ta
 ## Limitations
 Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
 ## References
 [1] [Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/abs/2005.08100)
 [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)

 This model provides transcribed speech as a string for a given audio sample.
 ## Model Architecture
 ## Limitations
 Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
+## NVIDIA Riva: Deployment
+If you like this and other models from NVIDIA (i.e., CTC-based Conformers) check out [NVIDIA Riva](https://developer.nvidia.com/riva), an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded. This model, as well as other RNNT-based models are currently not supported by Riva. You can find the list of models supported by Riva [here](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/reference/models/index.html).
+Additionally, Riva provides:
+* World-class out-of-the-box accuracy for the most common languages with model checkpoints trained on proprietary data with hundreds of thousands of GPU-compute hours
+* Best in class accuracy via customization with run-time word boosting (e.g., brand and product names), acoustic model training, language model training, and inverse text normalization customizations
+* Streaming speech recognition, Kubernetes compatible scaling, and Enterprise-grade support
+Check out [Riva live demo](https://developer.nvidia.com/riva#demos).
 ## References
 [1] [Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/abs/2005.08100)
 [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)