nvidia
/

stt_uk_citrinet_1024_gamma_0_25

Automatic Speech Recognition

hf-asr-leaderboard

Model card Files Files and versions Community

okuchaiev commited on Aug 3, 2022

Commit

3e0eb70

•

1 Parent(s): 83b41fb

Update README.md

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -125,7 +125,7 @@ img {
 | [![Riva Compatible](https://img.shields.io/badge/NVIDIA%20Riva-compatible-brightgreen#model-badge)](#deployment-with-nvidia-riva) |
 This model transcribes speech in lowercase Ukrainian alphabet including spaces and apostrophes, and is trained on 69 hours of Ukrainian speech data.
-It is a non-autoregressive "large" variant of Streaming Citrinet, with around 141 million parameters. Model is fine-tuned with pre-trained Russian Citrinet-1024 model on Ukrainian speech data using Cross-Language Transfer Learning [4] approach.
 See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-ctc) for complete architecture details.
 It is also compatible with NVIDIA Riva for [production-grade server deployments](#deployment-with-nvidia-riva).
@@ -180,9 +180,12 @@ The NeMo toolkit [3] was used for training the model for 1000 epochs. This model
 The tokenizer for this models was built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
 ### Datasets
-Model is trained on validated Mozilla Common Voice Corpus 10.0 dataset (excluding dev and test data) comprising of 69 hours of Ukrainian speech.
 ## Performance

 | [![Riva Compatible](https://img.shields.io/badge/NVIDIA%20Riva-compatible-brightgreen#model-badge)](#deployment-with-nvidia-riva) |
 This model transcribes speech in lowercase Ukrainian alphabet including spaces and apostrophes, and is trained on 69 hours of Ukrainian speech data.
+It is a non-autoregressive "large" variant of Streaming Citrinet, with around 141 million parameters. Model is fine-tuned from pre-trained Russian Citrinet-1024 model on Ukrainian speech data using Cross-Language Transfer Learning [4] approach.
 See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-ctc) for complete architecture details.
 It is also compatible with NVIDIA Riva for [production-grade server deployments](#deployment-with-nvidia-riva).
 The tokenizer for this models was built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
+For details on Cross-Lingual transfer learning see [4].
 ### Datasets
+This model has been trained using validated Mozilla Common Voice Corpus 10.0 dataset (excluding dev and test data) comprising of 69 hours of Ukrainian speech. The Russian model from which this model is fine-tuned has been trained on the union of: (1) Mozilla Common Voice (V7 Ru), (2) Ru LibriSpeech (RuLS), (3) Sber GOLOS and (4) SOVA datasets.
 ## Performance