Automatic Speech Recognition
NeMo
PyTorch
4 languages
automatic-speech-translation
speech
audio
Transformer
FastConformer
Conformer
NeMo
hf-asr-leaderboard
Eval Results
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -276,7 +276,7 @@ NVIDIA [NeMo Canary](https://nvidia.github.io/NeMo/blogs/2024/2024-02-canary/) i
276
 
277
  Canary is an encoder-decoder model with FastConformer [1] encoder and Transformer Decoder [2].
278
  With audio features extracted from the encoder, task tokens such as `<source language>`, `<target language>`, `<task>` and `<toggle PnC>`
279
- are fed into the Transformer Decoder to trigger the text generation process. Canary uses a concatenated tokenizer from individual
280
  SentencePiece [3] tokenizers of each language, which makes it easy to scale up to more languages.
281
  The Canay-1B model has 24 encoder layers and 24 layers of decoder layers in total.
282
 
@@ -479,7 +479,7 @@ BLEU score on [FLEURS](https://huggingface.co/datasets/google/fleurs) test set:
479
 
480
  | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** | **De->En** | **Es->En** | **Fr->En** |
481
  |:-----------:|:---------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|
482
- | 1.23.0 | canary-1b | 22.66 | 41.11 | 40.76 | 32.64 | 32.15 | 23.57 |
483
 
484
 
485
  BLEU score on [COVOST-v2](https://github.com/facebookresearch/covost) test set:
@@ -518,6 +518,7 @@ Check out [Riva live demo](https://developer.nvidia.com/riva#demos).
518
 
519
  [4] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
520
 
 
521
 
522
  ## Licence
523
 
 
276
 
277
  Canary is an encoder-decoder model with FastConformer [1] encoder and Transformer Decoder [2].
278
  With audio features extracted from the encoder, task tokens such as `<source language>`, `<target language>`, `<task>` and `<toggle PnC>`
279
+ are fed into the Transformer Decoder to trigger the text generation process. Canary uses a concatenated tokenizer [5] from individual
280
  SentencePiece [3] tokenizers of each language, which makes it easy to scale up to more languages.
281
  The Canay-1B model has 24 encoder layers and 24 layers of decoder layers in total.
282
 
 
479
 
480
  | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** | **De->En** | **Es->En** | **Fr->En** |
481
  |:-----------:|:---------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|
482
+ | 1.23.0 | canary-1b | 32.13 | 22.66 | 40.76 | 33.98 | 21.80 | 30.95 |
483
 
484
 
485
  BLEU score on [COVOST-v2](https://github.com/facebookresearch/covost) test set:
 
518
 
519
  [4] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
520
 
521
+ [5] [Unified Model for Code-Switching Speech Recognition and Language Identification Based on Concatenated Tokenizer](https://aclanthology.org/2023.calcs-1.7.pdf)
522
 
523
  ## Licence
524