ESPnet
102 languages
audio
self-supervised-learning
speech-recognition
wanchichen commited on
Commit
09ffb0d
1 Parent(s): 921b02e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +170 -0
README.md ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - self-supervised-learning
6
+ - speech-recognition
7
+ multilinguality:
8
+ - multilingual
9
+ task_categories:
10
+ - automatic-speech-recognition
11
+ language:
12
+ - afr
13
+ - amh
14
+ - ara
15
+ - asm
16
+ - ast
17
+ - azj
18
+ - bel
19
+ - ben
20
+ - bos
21
+ - cat
22
+ - ceb
23
+ - cmn
24
+ - ces
25
+ - cym
26
+ - dan
27
+ - deu
28
+ - ell
29
+ - eng
30
+ - spa
31
+ - est
32
+ - fas
33
+ - ful
34
+ - fin
35
+ - tgl
36
+ - fra
37
+ - gle
38
+ - glg
39
+ - guj
40
+ - hau
41
+ - heb
42
+ - hin
43
+ - hrv
44
+ - hun
45
+ - hye
46
+ - ind
47
+ - ibo
48
+ - isl
49
+ - ita
50
+ - jpn
51
+ - jav
52
+ - kat
53
+ - kam
54
+ - kea
55
+ - kaz
56
+ - khm
57
+ - kan
58
+ - kor
59
+ - ckb
60
+ - kir
61
+ - ltz
62
+ - lug
63
+ - lin
64
+ - lao
65
+ - lit
66
+ - luo
67
+ - lav
68
+ - mri
69
+ - mkd
70
+ - mal
71
+ - mon
72
+ - mar
73
+ - msa
74
+ - mlt
75
+ - mya
76
+ - nob
77
+ - npi
78
+ - nld
79
+ - nso
80
+ - nya
81
+ - oci
82
+ - orm
83
+ - ory
84
+ - pan
85
+ - pol
86
+ - pus
87
+ - por
88
+ - ron
89
+ - rus
90
+ - bul
91
+ - snd
92
+ - slk
93
+ - slv
94
+ - sna
95
+ - som
96
+ - srp
97
+ - swe
98
+ - swh
99
+ - tam
100
+ - tel
101
+ - tgk
102
+ - tha
103
+ - tur
104
+ - ukr
105
+ - umb
106
+ - urd
107
+ - uzb
108
+ - vie
109
+ - wol
110
+ - xho
111
+ - yor
112
+ - yue
113
+ - zul
114
+ datasets:
115
+ - fleurs
116
+ - babel
117
+ - voxpopuli
118
+ - commonvoice
119
+ license: cc-by-4.0
120
+ ---
121
+
122
+ ## WavLabLM-MS 40k
123
+
124
+ [Paper](https://arxiv.org/abs/2309.15317)
125
+
126
+ This model was trained by [William Chen](https://wanchichen.github.io/) using ESPNet2's SSL recipe in [espnet](https://github.com/espnet/espnet/).
127
+ WavLabLM is an self-supervised audio encoder pre-trained on 40,000 hours of multilingual data across 136 languages. This specific variant, WavLabLM-MK, uses a K-means model trained on multilingual data for the quantization.
128
+ It is used as the base model for [WavLabLM-MS](https://huggingface.co/espnet/WavLabLM-MS-40k), which is better tuned for non-European languages.
129
+
130
+
131
+ ```BibTex
132
+ @misc{chen2023joint,
133
+ title={Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning},
134
+ author={William Chen and Jiatong Shi and Brian Yan and Dan Berrebbi and Wangyou Zhang and Yifan Peng and Xuankai Chang and Soumi Maiti and Shinji Watanabe},
135
+ year={2023},
136
+ eprint={2309.15317},
137
+ archivePrefix={arXiv},
138
+ primaryClass={cs.CL}
139
+ }
140
+ ```
141
+
142
+
143
+ ### Citing ESPnet
144
+
145
+ ```BibTex
146
+ @inproceedings{watanabe2018espnet,
147
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
148
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
149
+ year={2018},
150
+ booktitle={Proceedings of Interspeech},
151
+ pages={2207--2211},
152
+ doi={10.21437/Interspeech.2018-1456},
153
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
154
+ }
155
+
156
+
157
+ ```
158
+
159
+ or arXiv:
160
+
161
+ ```bibtex
162
+ @misc{watanabe2018espnet,
163
+ title={ESPnet: End-to-End Speech Processing Toolkit},
164
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
165
+ year={2018},
166
+ eprint={1804.00015},
167
+ archivePrefix={arXiv},
168
+ primaryClass={cs.CL}
169
+ }
170
+ ```