nvidia
/

canary-1b

Automatic Speech Recognition

automatic-speech-translation

hf-asr-leaderboard

Model card Files Files and versions Community

nithinraok commited on Feb 8, 2024

Commit

e2ec446

·

verified ·

1 Parent(s): 3246124

Update README.md

Files changed (1) hide show

README.md +2 -4

README.md CHANGED Viewed

@@ -286,7 +286,7 @@ The Canay-1B model has 24 encoder layers and 24 layers of decoder layers in tota
 To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed Cython and latest PyTorch version.
 ```
-pip install nemo_toolkit['all']
 ```
@@ -408,9 +408,7 @@ The tokenizers for these models were built using the text transcripts of the tra
 ### Datasets
-The Canary-1B model is trained on 70K hours of speech audio with transcriptions in their original languages for ASR, and machine-generated translations for each supported language for speech translation.
-The training data contains 43K hours of English speech collected and prepared by NVIDIA NeMo and [Suno](https://suno.ai/) teams, and an inhouse subset with 27K hours of English/German/Spanish/French speech.
 ## Performance

 To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed Cython and latest PyTorch version.
 ```
+pip install git+https://github.com/NVIDIA/NeMo.git@r1.23.0#egg=nemo_toolkit[all]
 ```
 ### Datasets
+The Canary-1B model is trained on a total of 85k hrs of speech data. It consists of 31k hrs of public data, 20k hrs collected by [Suno](https://suno.ai/), and 34k hrs of in-house data.
 ## Performance