nvidia
/

canary-1b

Automatic Speech Recognition

automatic-speech-translation

hf-asr-leaderboard

Model card Files Files and versions Community

Update README.md

#17

by steveheh - opened Feb 20

base: refs/heads/main

←

from: refs/pr/17

Discussion Files changed

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -276,7 +276,7 @@ NVIDIA [NeMo Canary](https://nvidia.github.io/NeMo/blogs/2024/2024-02-canary/) i
 Canary is an encoder-decoder model with FastConformer [1] encoder and Transformer Decoder [2].
 With audio features extracted from the encoder, task tokens such as `<source language>`, `<target language>`, `<task>` and `<toggle PnC>`
-are fed into the Transformer Decoder to trigger the text generation process. Canary uses a concatenated tokenizer from individual
 SentencePiece [3] tokenizers of each language, which makes it easy to scale up to more languages.
 The Canay-1B model has 24 encoder layers and 24 layers of decoder layers in total.
@@ -479,7 +479,7 @@ BLEU score on [FLEURS](https://huggingface.co/datasets/google/fleurs) test set:
 | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** | **De->En** | **Es->En** | **Fr->En** |
 |:-----------:|:---------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|
-| 1.23.0      | canary-1b | 22.66	   | 41.11      | 40.76      | 32.64      | 32.15      | 23.57      |
 BLEU score on [COVOST-v2](https://github.com/facebookresearch/covost) test set:
@@ -518,6 +518,7 @@ Check out [Riva live demo](https://developer.nvidia.com/riva#demos).
 [4] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
 ## Licence

 Canary is an encoder-decoder model with FastConformer [1] encoder and Transformer Decoder [2].
 With audio features extracted from the encoder, task tokens such as `<source language>`, `<target language>`, `<task>` and `<toggle PnC>`
+are fed into the Transformer Decoder to trigger the text generation process. Canary uses a concatenated tokenizer [5] from individual
 SentencePiece [3] tokenizers of each language, which makes it easy to scale up to more languages.
 The Canay-1B model has 24 encoder layers and 24 layers of decoder layers in total.
 | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** | **De->En** | **Es->En** | **Fr->En** |
 |:-----------:|:---------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|
+| 1.23.0      | canary-1b | 32.13	   | 22.66      | 40.76      | 33.98      | 21.80      | 30.95      |
 BLEU score on [COVOST-v2](https://github.com/facebookresearch/covost) test set:
 [4] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
+[5] [Unified Model for Code-Switching Speech Recognition and Language Identification Based on Concatenated Tokenizer](https://aclanthology.org/2023.calcs-1.7.pdf)
 ## Licence