Automatic Speech Recognition
NeMo
PyTorch
4 languages
automatic-speech-translation
speech
audio
Transformer
FastConformer
Conformer
NeMo
hf-asr-leaderboard
Eval Results
nithinraok commited on
Commit
e2ec446
1 Parent(s): 3246124

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -4
README.md CHANGED
@@ -286,7 +286,7 @@ The Canay-1B model has 24 encoder layers and 24 layers of decoder layers in tota
286
 
287
  To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed Cython and latest PyTorch version.
288
  ```
289
- pip install nemo_toolkit['all']
290
  ```
291
 
292
 
@@ -408,9 +408,7 @@ The tokenizers for these models were built using the text transcripts of the tra
408
 
409
  ### Datasets
410
 
411
- The Canary-1B model is trained on 70K hours of speech audio with transcriptions in their original languages for ASR, and machine-generated translations for each supported language for speech translation.
412
-
413
- The training data contains 43K hours of English speech collected and prepared by NVIDIA NeMo and [Suno](https://suno.ai/) teams, and an inhouse subset with 27K hours of English/German/Spanish/French speech.
414
 
415
 
416
  ## Performance
 
286
 
287
  To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed Cython and latest PyTorch version.
288
  ```
289
+ pip install git+https://github.com/NVIDIA/NeMo.git@r1.23.0#egg=nemo_toolkit[all]
290
  ```
291
 
292
 
 
408
 
409
  ### Datasets
410
 
411
+ The Canary-1B model is trained on a total of 85k hrs of speech data. It consists of 31k hrs of public data, 20k hrs collected by [Suno](https://suno.ai/), and 34k hrs of in-house data.
 
 
412
 
413
 
414
  ## Performance