igitman commited on
Commit
9c4bf77
1 Parent(s): ad08092

Add links to SDP configs

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -191,9 +191,9 @@ The tokenizers for these models were built using the text transcripts of the tra
191
 
192
  The model in this collection are trained on a composite dataset (NeMo PnC IT ASRSET) comprising of 487 hours of Italian speech:
193
 
194
- - Mozilla Common Voice 12.0 (Italian) - 220 hours after data cleaning
195
- - Multilingual LibriSpeech (Italian) - 214 hours after data cleaning
196
- - VoxPopuli transcribed subset (Italian) - 53 hours after data cleaning
197
 
198
  ## Performance
199
 
 
191
 
192
  The model in this collection are trained on a composite dataset (NeMo PnC IT ASRSET) comprising of 487 hours of Italian speech:
193
 
194
+ - Mozilla Common Voice 12.0 (Italian) - 220 hours after data cleaning. [Speech Data Processor](https://github.com/NVIDIA/NeMo-speech-data-processor) config used to prepare this data is [here](https://github.com/NVIDIA/NeMo-speech-data-processor/blob/main/dataset_configs/italian/mcv/config.yaml).
195
+ - Multilingual LibriSpeech (Italian) - 214 hours after data cleaning. [Speech Data Processor](https://github.com/NVIDIA/NeMo-speech-data-processor) config used to prepare this data is [here](https://github.com/NVIDIA/NeMo-speech-data-processor/blob/main/dataset_configs/italian/mls/config.yaml).
196
+ - VoxPopuli transcribed subset (Italian) - 53 hours after data cleaning. [Speech Data Processor](https://github.com/NVIDIA/NeMo-speech-data-processor) config used to prepare this data is [here](https://github.com/NVIDIA/NeMo-speech-data-processor/blob/main/dataset_configs/italian/voxpopuli/config.yaml).
197
 
198
  ## Performance
199