smajumdar94 nljubesi commited on
Commit
0b588e1
1 Parent(s): 7af1b98

Adding links to the ParlaSpeech dataset / paper (#1)

Browse files

- Adding links to the ParlaSpeech dataset / paper (00343de3bc3c9f2d67b19610b6ef4d767298357b)


Co-authored-by: Nikola Ljubešić <nljubesi@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +8 -2
README.md CHANGED
@@ -95,7 +95,7 @@ Full config can be found inside the `.nemo` files.
95
 
96
  ### Datasets
97
 
98
- All the models in this collection are trained on ParlaSpeech-HR v1.0 Croatian dataset, which contains around 1665 hours of training data, 2.2 hours of development and 2.3 hours of test data after data cleaning.
99
 
100
  ## Performance
101
 
@@ -117,4 +117,10 @@ Since the model is trained just on ParlaSpeech-HR v1.0 dataset, the performance
117
 
118
  - [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
119
 
120
- - [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
 
 
 
 
 
 
 
95
 
96
  ### Datasets
97
 
98
+ All the models in this collection are trained on ParlaSpeech-HR v1.0 Croatian dataset [4,5], which contains around 1665 hours of training data, 2.2 hours of development and 2.3 hours of test data after data cleaning.
99
 
100
  ## Performance
101
 
 
117
 
118
  - [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
119
 
120
+ - [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
121
+
122
+ - [4] [ParlaSpeech-HR dataset](http://hdl.handle.net/11356/1494)
123
+
124
+ - [5] [ParlaSpeech-HR - a Freely Available ASR Dataset for Croatian Bootstrapped from the ParlaMint Corpus](https://aclanthology.org/2022.parlaclarin-1.16/)
125
+
126
+ -