okuchaiev rleary commited on
Commit
75925ff
1 Parent(s): 69936c5

Update README.md with improved frontmatter. (#1)

Browse files

- Update README.md with improved frontmatter. (1b6a7f64be7a09e36247f6c08ae5ba2810037184)


Co-authored-by: Ryan Leary <rleary@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +26 -14
README.md CHANGED
@@ -142,26 +142,38 @@ model-index:
142
  type: wer
143
  value: 7.0
144
  ---
145
- ## Model Overview
146
 
147
- This model transcribes speech in lower case English alphabet along with spaces and apostrophes.
148
- It is a "large" versions of Conformer-CTC (around 120M parameters) model.
 
 
 
 
 
149
 
150
- ## NVIDIA NeMo: Training
151
 
152
- To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest Pytorch version.
153
- ```
154
- pip install nemo_toolkit['all']
155
- ```
156
-
157
- ## NVIDIA Riva: Deployment
158
 
159
- For the best real-time accuracy, latency, and throughput, deploy the model with [NVIDIA Riva](#deployment-with-nvidia-riva), an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, at the edge, and embedded.
 
 
 
160
 
161
- ## How to Use this Model
 
162
 
163
  The model is available for use in the NeMo toolkit [3], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
164
 
 
 
 
 
 
 
165
  ### Automatically instantiate the model
166
 
167
  ```python
@@ -189,7 +201,7 @@ python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py
189
 
190
  ### Input
191
 
192
- This model accepts 16000 KHz Mono-channel Audio (wav files) as input.
193
 
194
  ### Output
195
 
@@ -197,7 +209,7 @@ This model provides transcribed speech as a string for a given audio sample.
197
 
198
  ## Model Architecture
199
 
200
- Conformer-CTC model is a non-autoregressive variant of Conformer model [1] for Automatic Speech Recognition which uses CTC loss/decoding instead of Transducer. You may find more info on the detail of this model here: [Conformer-CTC Model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html).
201
 
202
  ## Training
203
 
 
142
  type: wer
143
  value: 7.0
144
  ---
 
145
 
146
+ <style type="text/css" rel=stylesheet">
147
+ img[src$='#model-badge'] {
148
+ display:inline;
149
+ margin-bottom:0;
150
+ margin-top:0;
151
+ }
152
+ </style>
153
 
154
+ # NVIDIA Conformer-CTC Large (en-US)
155
 
156
+ [![Model architecture](https://img.shields.io/badge/Model_Arch-Conformer--CTC-lightgrey#model-badge)](#model-architecture)
157
+ | [![Model size](https://img.shields.io/badge/Params-120M-lightgrey#model-badge)](#model-architecture)
158
+ | [![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
159
+ | [![Riva Compatible](https://img.shields.io/badge/NVIDIA%20Riva-compatible-brightgreen#model-badge)](#deployment-with-nvidia-riva)
 
 
160
 
161
+ This model transcribes speech in lowercase English alphabet including spaces and apostrophes, and is trained on several thousand hours of English speech data.
162
+ It is a non-autoregressive "large" variant of Conformer, with around 120 million parameters.
163
+ See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-ctc) for complete architecture details.
164
+ It is also compatible with NVIDIA Riva for [production-grade server deployments](#deployment-with-nvidia-riva).
165
 
166
+
167
+ ## Usage
168
 
169
  The model is available for use in the NeMo toolkit [3], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
170
 
171
+ To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest PyTorch version.
172
+
173
+ ```
174
+ pip install nemo_toolkit['all']
175
+ ```
176
+
177
  ### Automatically instantiate the model
178
 
179
  ```python
 
201
 
202
  ### Input
203
 
204
+ This model accepts 16000 kHz Mono-channel Audio (wav files) as input.
205
 
206
  ### Output
207
 
 
209
 
210
  ## Model Architecture
211
 
212
+ Conformer-CTC model is a non-autoregressive variant of Conformer model [1] for Automatic Speech Recognition which uses CTC loss/decoding instead of Transducer. You may find more info on the detail of this model here: [Conformer-CTC Model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-ctc).
213
 
214
  ## Training
215