Update README.md
Browse files
README.md
CHANGED
@@ -152,7 +152,11 @@ It is a "large" versions of Conformer-CTC (around 120M parameters) model.
|
|
152 |
To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest Pytorch version.
|
153 |
```
|
154 |
pip install nemo_toolkit['all']
|
155 |
-
```
|
|
|
|
|
|
|
|
|
156 |
|
157 |
## How to Use this Model
|
158 |
|
@@ -191,18 +195,6 @@ This model accepts 16000 KHz Mono-channel Audio (wav files) as input.
|
|
191 |
|
192 |
This model provides transcribed speech as a string for a given audio sample.
|
193 |
|
194 |
-
## NVIDIA Riva: Deployment
|
195 |
-
|
196 |
-
For the best real-time accuracy, latency, and throughput, deploy the model with [NVIDIA Riva], an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, at the edge, and embedded.
|
197 |
-
|
198 |
-
Additionally, Riva provides:
|
199 |
-
|
200 |
-
* World-class out-of-the-box accuracy for the most common languages with model checkpoints trained on proprietary data with hundreds of thousands of GPU-compute hours
|
201 |
-
* Best in class accuracy via customization with run-time word boosting (e.g., brand and product names), acoustic model training, language model training, and inverse text normalization customizations
|
202 |
-
* Streaming speech recognition, Kubernetes compatible scaling, and Enterprise-grade support
|
203 |
-
|
204 |
-
Check out [Riva live demo](https://developer.nvidia.com/riva#demos).
|
205 |
-
|
206 |
## Model Architecture
|
207 |
|
208 |
Conformer-CTC model is a non-autoregressive variant of Conformer model [1] for Automatic Speech Recognition which uses CTC loss/decoding instead of Transducer. You may find more info on the detail of this model here: [Conformer-CTC Model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html).
|
@@ -253,9 +245,19 @@ While deploying with [NVIDIA Riva](https://developer.nvidia.com/riva), you can c
|
|
253 |
|
254 |
Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
|
255 |
|
|
|
256 |
|
257 |
-
|
258 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
259 |
|
260 |
[1] [Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/abs/2005.08100)
|
261 |
|
|
|
152 |
To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest Pytorch version.
|
153 |
```
|
154 |
pip install nemo_toolkit['all']
|
155 |
+
```
|
156 |
+
|
157 |
+
## NVIDIA Riva: Deployment
|
158 |
+
|
159 |
+
For the best real-time accuracy, latency, and throughput, deploy the model with [NVIDIA Riva](#deployment-with-nvidia-riva), an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, at the edge, and embedded.
|
160 |
|
161 |
## How to Use this Model
|
162 |
|
|
|
195 |
|
196 |
This model provides transcribed speech as a string for a given audio sample.
|
197 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
198 |
## Model Architecture
|
199 |
|
200 |
Conformer-CTC model is a non-autoregressive variant of Conformer model [1] for Automatic Speech Recognition which uses CTC loss/decoding instead of Transducer. You may find more info on the detail of this model here: [Conformer-CTC Model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html).
|
|
|
245 |
|
246 |
Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
|
247 |
|
248 |
+
## Deployment with NVIDIA Riva
|
249 |
|
250 |
+
For the best real-time accuracy, latency, and throughput, deploy the model with [NVIDIA Riva](https://developer.nvidia.com/riva), an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, at the edge, and embedded.
|
251 |
|
252 |
+
Additionally, Riva provides:
|
253 |
+
|
254 |
+
* World-class out-of-the-box accuracy for the most common languages with model checkpoints trained on proprietary data with hundreds of thousands of GPU-compute hours
|
255 |
+
* Best in class accuracy via customization with run-time word boosting (e.g., brand and product names), acoustic model training, language model training, and inverse text normalization customizations
|
256 |
+
* Streaming speech recognition, Kubernetes compatible scaling, and Enterprise-grade support
|
257 |
+
|
258 |
+
Check out [Riva live demo](https://developer.nvidia.com/riva#demos).
|
259 |
+
|
260 |
+
## References
|
261 |
|
262 |
[1] [Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/abs/2005.08100)
|
263 |
|