nvidia
/

stt_en_conformer_ctc_large

Automatic Speech Recognition

NeMo

PyTorch

Model card Files Files and versions Community

okuchaiev

rleary commited on Jun 21, 2022

Commit

75925ff

•

1 Parent(s): 69936c5

Update README.md with improved frontmatter. (#1)

Browse files

- Update README.md with improved frontmatter. (1b6a7f64be7a09e36247f6c08ae5ba2810037184)

Co-authored-by: Ryan Leary <rleary@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +26 -14

README.md CHANGED Viewed

@@ -142,26 +142,38 @@ model-index:
       type: wer
       value: 7.0
 ---
-## Model Overview
-This model transcribes speech in lower case English alphabet along with spaces and apostrophes.
-It is a "large" versions of Conformer-CTC (around 120M parameters) model.
-## NVIDIA NeMo: Training
-To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest Pytorch version.
-```
-pip install nemo_toolkit['all']
-```
-## NVIDIA Riva: Deployment
-For the best real-time accuracy, latency, and throughput, deploy the model with [NVIDIA Riva](#deployment-with-nvidia-riva), an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, at the edge, and embedded.
-## How to Use this Model
 The model is available for use in the NeMo toolkit [3], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
 ### Automatically instantiate the model
 ```python
@@ -189,7 +201,7 @@ python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py
 ### Input
-This model accepts 16000 KHz Mono-channel Audio (wav files) as input.
 ### Output
@@ -197,7 +209,7 @@ This model provides transcribed speech as a string for a given audio sample.
 ## Model Architecture
-Conformer-CTC model is a non-autoregressive variant of Conformer model [1] for Automatic Speech Recognition which uses CTC loss/decoding instead of Transducer. You may find more info on the detail of this model here: [Conformer-CTC Model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html).
 ## Training

       type: wer
       value: 7.0
 ---
+<style type="text/css" rel=stylesheet">
+img[src$='#model-badge'] {
+display:inline;
+margin-bottom:0;
+margin-top:0;
+}
+</style>
+# NVIDIA Conformer-CTC Large (en-US)
+[![Model architecture](https://img.shields.io/badge/Model_Arch-Conformer--CTC-lightgrey#model-badge)](#model-architecture)
+| [![Model size](https://img.shields.io/badge/Params-120M-lightgrey#model-badge)](#model-architecture)
+| [![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
+| [![Riva Compatible](https://img.shields.io/badge/NVIDIA%20Riva-compatible-brightgreen#model-badge)](#deployment-with-nvidia-riva)
+This model transcribes speech in lowercase English alphabet including spaces and apostrophes, and is trained on several thousand hours of English speech data.
+It is a non-autoregressive "large" variant of Conformer, with around 120 million parameters.
+See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-ctc) for complete architecture details.
+It is also compatible with NVIDIA Riva for [production-grade server deployments](#deployment-with-nvidia-riva).
+## Usage
 The model is available for use in the NeMo toolkit [3], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
+To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest PyTorch version.
+```
+pip install nemo_toolkit['all']
+```
 ### Automatically instantiate the model
 ```python
 ### Input
+This model accepts 16000 kHz Mono-channel Audio (wav files) as input.
 ### Output
 ## Model Architecture
+Conformer-CTC model is a non-autoregressive variant of Conformer model [1] for Automatic Speech Recognition which uses CTC loss/decoding instead of Transducer. You may find more info on the detail of this model here: [Conformer-CTC Model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-ctc).
 ## Training