ESPnet
102 languages
audio
self-supervised-learning
speech-recognition
wanchichen commited on
Commit
d4cf80d
1 Parent(s): 78a2f20

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -127,6 +127,16 @@ This model was trained by [William Chen](https://wanchichen.github.io/) using ES
127
  WavLabLM is an self-supervised audio encoder pre-trained on 40,000 hours of multilingual data across 136 languages. This specific variant, WavLabLM-MS, went through a second stage of pre-training on a balanced subset of the data to improve performance on lower-resource languages.
128
  It achieves comparable performance to XLS-R 128 on the [ML-SUPERB Benchmark](https://arxiv.org/abs/2305.10615) with only 10% of the pre-training data.
129
 
 
 
 
 
 
 
 
 
 
 
130
 
131
 
132
  ### Citing ESPnet
 
127
  WavLabLM is an self-supervised audio encoder pre-trained on 40,000 hours of multilingual data across 136 languages. This specific variant, WavLabLM-MS, went through a second stage of pre-training on a balanced subset of the data to improve performance on lower-resource languages.
128
  It achieves comparable performance to XLS-R 128 on the [ML-SUPERB Benchmark](https://arxiv.org/abs/2305.10615) with only 10% of the pre-training data.
129
 
130
+ ```BibTex
131
+ @misc{chen2023joint,
132
+ title={Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning},
133
+ author={William Chen and Jiatong Shi and Brian Yan and Dan Berrebbi and Wangyou Zhang and Yifan Peng and Xuankai Chang and Soumi Maiti and Shinji Watanabe},
134
+ year={2023},
135
+ eprint={2309.15317},
136
+ archivePrefix={arXiv},
137
+ primaryClass={cs.CL}
138
+ }
139
+ ```
140
 
141
 
142
  ### Citing ESPnet