fav-kky
/

wav2vec2-base-cs-80k-ClTRUS

Inference Endpoints

Model card Files Files and versions Community

jlehecka commited on Jun 16, 2022

Commit

a52b241

·

1 Parent(s): 061a77d

Update README.md

Files changed (1) hide show

README.md +45 -1

README.md CHANGED Viewed

@@ -10,4 +10,48 @@ license: "cc-by-nc-sa-4.0"
 # wav2vec2-base-cs-80k-ClTRUS
 **C**zech **l**anguage **TR**ransformer from **U**nlabeled **S**peech (ClTRUS) is a monolingual Czech Wav2Vec 2.0 base model pre-trained from 80 thousand hours of Czech speech.
-Preprint of our paper is available at ....

 # wav2vec2-base-cs-80k-ClTRUS
 **C**zech **l**anguage **TR**ransformer from **U**nlabeled **S**peech (ClTRUS) is a monolingual Czech Wav2Vec 2.0 base model pre-trained from 80 thousand hours of Czech speech.
+This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for speech recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data.
+**Note:** This is a checkpoint of the model after 4 epochs over the whole dataset. If you want some earlier or later checkpoints, please feel free to contact the author (jlehecka(at)kky.zcu.cz).
+## Pretraining data
+More than 80 thousand hours of unlabeled Czech speech:
+- recordings from radio (22k hours),
+- unlabeled data from VoxPopuli dataset (18.7k hours),
+- TV shows (15k hours),
+- shadow speakers (12k hours),
+- sports (5k hours),
+- telephone data (2k hours),
+- and a smaller amount of data from several other domains including the public CommonVoice dataset.
+## Speech recognition results
+After fine-tuning, the model scored the following results on public datasets:
+- Czech portion of CommonVoice v7.0: **WER = 3.8%**
+- Czech portion of VoxPopuli: **WER = 8.8%**
+See our paper for details.
+## Paper
+The preprint of our paper (accepted to INTERSPEECH 2022) is available at http://arxiv.org/abs/2206.07627
+## Citation
+If you find this model useful, please cite our paper:
+```
+@inproceedings{wav2vec2-base-cs-80k-ClTRUS,
+  title = {Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of {C}zech},
+  author = {
+    Jan Lehe\v{c}ka and
+    Jan \v{S}vec and
+    Ale\v{s} Pra\v{z}\'ak and
+    Josef V. Psutka
+  },
+  booktitle = {Interspeech 2022},
+  publisher = {{ISCA}},
+  year = {2022},
+  note = {(in press)},
+  url = {https://arxiv.org/abs/2206.07627},
+}
+```
+## Other papers using this model:
+- [Transformer-based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project](https://arxiv.org/abs/2206.07666)