nljubesi commited on
Commit
d6133fb
1 Parent(s): 1cd2542

Added references for the ParlaSpeech-HR dataset

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -97,7 +97,7 @@ Full config can be found inside the `.nemo` files.
97
 
98
  ### Datasets
99
 
100
- All the models in this collection are trained on ParlaSpeech-HR v1.0 Croatian dataset, which contains around 1665 hours of training data, 2.2 hours of development and 2.3 hours of test data after data cleaning.
101
 
102
  ## Performance
103
 
@@ -130,4 +130,8 @@ Check out [Riva live demo](https://developer.nvidia.com/riva#demos).
130
 
131
  - [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
132
 
133
- - [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
 
 
 
 
 
97
 
98
  ### Datasets
99
 
100
+ All the models in this collection are trained on ParlaSpeech-HR v1.0 Croatian dataset [4,5], which contains around 1665 hours of training data, 2.2 hours of development and 2.3 hours of test data after data cleaning.
101
 
102
  ## Performance
103
 
 
130
 
131
  - [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
132
 
133
+ - [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
134
+
135
+ - [4] [ParlaSpeech-HR dataset](http://hdl.handle.net/11356/1494)
136
+
137
+ - [5] [ParlaSpeech-HR - a Freely Available ASR Dataset for Croatian Bootstrapped from the ParlaMint Corpus](https://aclanthology.org/2022.parlaclarin-1.16/)