Automatic Speech Recognition
NeMo
PyTorch
English
speech
audio
FastConformer
Conformer
NeMo
hf-asr-leaderboard
ctc
Eval Results
nithinraok commited on
Commit
1f0724b
1 Parent(s): e9993e4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -6
README.md CHANGED
@@ -21,12 +21,12 @@ tags:
21
  - automatic-speech-recognition
22
  - speech
23
  - audio
24
- - Transducer
25
  - FastConformer
26
  - Conformer
27
  - pytorch
28
  - NeMo
29
  - hf-asr-leaderboard
 
30
  license: cc-by-4.0
31
  widget:
32
  - example_title: Librispeech sample 1
@@ -117,7 +117,7 @@ model-index:
117
  metrics:
118
  - name: Test WER
119
  type: wer
120
- value: 4.20
121
  - task:
122
  type: Automatic Speech Recognition
123
  name: automatic-speech-recognition
@@ -160,7 +160,6 @@ model-index:
160
  - name: Test WER
161
  type: wer
162
  value: 9.02
163
-
164
  metrics:
165
  - wer
166
  pipeline_tag: automatic-speech-recognition
@@ -179,7 +178,7 @@ img {
179
  | [![Language](https://img.shields.io/badge/Language-en-lightgrey#model-badge)](#datasets)
180
 
181
 
182
- parakeet-rnnt-1.1b is an ASR model that transcribes speech in lower case English alphabet. This model is jointly developed by [NVIDIA NeMo](https://github.com/NVIDIA/NeMo) and [Suno.ai](https://www.suno.ai/) teams.
183
  It is an XXL version of FastConformer CTC [1] (around 1.1B parameters) model.
184
  See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer) for complete architecture details.
185
 
@@ -198,7 +197,7 @@ The model is available for use in the NeMo toolkit [3], and can be used as a pre
198
 
199
  ```python
200
  import nemo.collections.asr as nemo_asr
201
- asr_model = nemo_asr.models.EncDecRNNTBPEModel.from_pretrained(model_name="nvidia/parakeet-ctc-1.1b")
202
  ```
203
 
204
  ### Transcribing using Python
@@ -259,7 +258,7 @@ The training dataset consists of private subset with 40K hours of English speech
259
 
260
  The performance of Automatic Speech Recognition models is measuring using Word Error Rate. Since this dataset is trained on multiple domains and a much larger corpus, it will generally perform better at transcribing audio in general.
261
 
262
- The following tables summarizes the performance of the available models in this collection with the Transducer decoder. Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding.
263
 
264
  |**Version**|**Tokenizer**|**Vocabulary Size**|**AMI**|**Earnings-22**|**Giga Speech**|**LS test-clean**|**SPGI Speech**|**TEDLIUM-v3**|**Vox Populi**|**Common Voice**|
265
  |---------|-----------------------|-----------------|---------------|---------------|------------|-----------|-----|-------|------|------|
 
21
  - automatic-speech-recognition
22
  - speech
23
  - audio
 
24
  - FastConformer
25
  - Conformer
26
  - pytorch
27
  - NeMo
28
  - hf-asr-leaderboard
29
+ - ctc
30
  license: cc-by-4.0
31
  widget:
32
  - example_title: Librispeech sample 1
 
117
  metrics:
118
  - name: Test WER
119
  type: wer
120
+ value: 4.2
121
  - task:
122
  type: Automatic Speech Recognition
123
  name: automatic-speech-recognition
 
160
  - name: Test WER
161
  type: wer
162
  value: 9.02
 
163
  metrics:
164
  - wer
165
  pipeline_tag: automatic-speech-recognition
 
178
  | [![Language](https://img.shields.io/badge/Language-en-lightgrey#model-badge)](#datasets)
179
 
180
 
181
+ parakeet-ctc-1.1b is an ASR model that transcribes speech in lower case English alphabet. This model is jointly developed by [NVIDIA NeMo](https://github.com/NVIDIA/NeMo) and [Suno.ai](https://www.suno.ai/) teams.
182
  It is an XXL version of FastConformer CTC [1] (around 1.1B parameters) model.
183
  See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer) for complete architecture details.
184
 
 
197
 
198
  ```python
199
  import nemo.collections.asr as nemo_asr
200
+ asr_model = nemo_asr.models.EncDecCTCBPEModel.from_pretrained(model_name="nvidia/parakeet-ctc-1.1b")
201
  ```
202
 
203
  ### Transcribing using Python
 
258
 
259
  The performance of Automatic Speech Recognition models is measuring using Word Error Rate. Since this dataset is trained on multiple domains and a much larger corpus, it will generally perform better at transcribing audio in general.
260
 
261
+ The following tables summarizes the performance of the available models in this collection with the CTC decoder. Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding.
262
 
263
  |**Version**|**Tokenizer**|**Vocabulary Size**|**AMI**|**Earnings-22**|**Giga Speech**|**LS test-clean**|**SPGI Speech**|**TEDLIUM-v3**|**Vox Populi**|**Common Voice**|
264
  |---------|-----------------------|-----------------|---------------|---------------|------------|-----------|-----|-------|------|------|