fav-kky
/

wav2vec2-base-sk-17k

Inference Endpoints

Model card Files Files and versions Community

jlehecka commited on Jun 8, 2023

Commit

e2f223b

•

1 Parent(s): c7522c4

Update README.md

Files changed (1) hide show

README.md +12 -7

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ This is a monolingual Slovak Wav2Vec 2.0 base model pre-trained from 17 thousand
 This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for speech recognition, a tokenizer should be created, and the model should be fine-tuned on labeled data.
-The model was initialized from [fav-kky/wav2vec2-base-cs-80k-ClTRUS](https://huggingface.co/fav-kky/wav2vec2-base-cs-80k-ClTRUS), so transfer learning from Czech to Slovak was used to pre-train the model, see our paper for details.
 ## Pretraining data
 Almost 18 thousand hours of unlabeled Slovak speech:
@@ -51,24 +51,29 @@ After fine-tuning, the model scored the following results on public datasets:
 See our paper for details.
 ## Paper
-The preprint of our paper (accepted to TSD 2023) is available at TBD
 ## Citation
 If you find this model useful, please cite our paper:
 ```
-@inproceedings{wav2vec2-base-cs-80k-ClTRUS,
   title = {{Transfer Learning of Transformer-based Speech Recognition Models from Czech to Slovak}},
   author = {
     Jan Lehe\v{c}ka and
     Josef V. Psutka and
     Josef Psutka
   },
-  booktitle = {{TSD} 2023},
-  publisher = {{Springer}},
-  year = {2022},
   note = {(in press)},
 }
 ```
-## Related works
 - [fav-kky/wav2vec2-base-cs-80k-ClTRUS](https://huggingface.co/fav-kky/wav2vec2-base-cs-80k-ClTRUS)

 This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for speech recognition, a tokenizer should be created, and the model should be fine-tuned on labeled data.
+The model was initialized from Czech pre-trained model [fav-kky/wav2vec2-base-cs-80k-ClTRUS](https://huggingface.co/fav-kky/wav2vec2-base-cs-80k-ClTRUS). We found this cross-language transfer learning approach better than pre-training from scratch. See our paper for details.
 ## Pretraining data
 Almost 18 thousand hours of unlabeled Slovak speech:
 See our paper for details.
 ## Paper
+The preprint of our paper (accepted to TSD 2023) is available at https://arxiv.org/abs/2306.04399.
 ## Citation
 If you find this model useful, please cite our paper:
 ```
+@inproceedings{wav2vec2-base-sk-17k,
   title = {{Transfer Learning of Transformer-based Speech Recognition Models from Czech to Slovak}},
   author = {
     Jan Lehe\v{c}ka and
     Josef V. Psutka and
     Josef Psutka
   },
+  booktitle = {{Text, Speech, and Dialogue}},
+  publisher = {{Springer International Publishing}},
+  year = {2023},
   note = {(in press)},
+  url = {https://arxiv.org/abs/2306.04399},
 }
 ```
+## Related papers
+- [INTERSPEECH 2022 - Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech](https://www.isca-speech.org/archive/pdfs/interspeech_2022/lehecka22_interspeech.pdf)
+- INTERSPEECH 2023 - Transformer-based Speech Recognition Models for Oral History Archives in English, German, and Czech
+## Related models
 - [fav-kky/wav2vec2-base-cs-80k-ClTRUS](https://huggingface.co/fav-kky/wav2vec2-base-cs-80k-ClTRUS)