antoniomae1234 commited on
Commit
88197c7
1 Parent(s): c5824de

Upload 6 files

Browse files
Files changed (6) hide show
  1. LICENSE +202 -0
  2. README.md +14 -0
  3. app.py +262 -0
  4. gitattributes +35 -0
  5. model.py +799 -0
  6. requirements.txt +4 -0
LICENSE ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ Apache License
3
+ Version 2.0, January 2004
4
+ http://www.apache.org/licenses/
5
+
6
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
7
+
8
+ 1. Definitions.
9
+
10
+ "License" shall mean the terms and conditions for use, reproduction,
11
+ and distribution as defined by Sections 1 through 9 of this document.
12
+
13
+ "Licensor" shall mean the copyright owner or entity authorized by
14
+ the copyright owner that is granting the License.
15
+
16
+ "Legal Entity" shall mean the union of the acting entity and all
17
+ other entities that control, are controlled by, or are under common
18
+ control with that entity. For the purposes of this definition,
19
+ "control" means (i) the power, direct or indirect, to cause the
20
+ direction or management of such entity, whether by contract or
21
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
22
+ outstanding shares, or (iii) beneficial ownership of such entity.
23
+
24
+ "You" (or "Your") shall mean an individual or Legal Entity
25
+ exercising permissions granted by this License.
26
+
27
+ "Source" form shall mean the preferred form for making modifications,
28
+ including but not limited to software source code, documentation
29
+ source, and configuration files.
30
+
31
+ "Object" form shall mean any form resulting from mechanical
32
+ transformation or translation of a Source form, including but
33
+ not limited to compiled object code, generated documentation,
34
+ and conversions to other media types.
35
+
36
+ "Work" shall mean the work of authorship, whether in Source or
37
+ Object form, made available under the License, as indicated by a
38
+ copyright notice that is included in or attached to the work
39
+ (an example is provided in the Appendix below).
40
+
41
+ "Derivative Works" shall mean any work, whether in Source or Object
42
+ form, that is based on (or derived from) the Work and for which the
43
+ editorial revisions, annotations, elaborations, or other modifications
44
+ represent, as a whole, an original work of authorship. For the purposes
45
+ of this License, Derivative Works shall not include works that remain
46
+ separable from, or merely link (or bind by name) to the interfaces of,
47
+ the Work and Derivative Works thereof.
48
+
49
+ "Contribution" shall mean any work of authorship, including
50
+ the original version of the Work and any modifications or additions
51
+ to that Work or Derivative Works thereof, that is intentionally
52
+ submitted to Licensor for inclusion in the Work by the copyright owner
53
+ or by an individual or Legal Entity authorized to submit on behalf of
54
+ the copyright owner. For the purposes of this definition, "submitted"
55
+ means any form of electronic, verbal, or written communication sent
56
+ to the Licensor or its representatives, including but not limited to
57
+ communication on electronic mailing lists, source code control systems,
58
+ and issue tracking systems that are managed by, or on behalf of, the
59
+ Licensor for the purpose of discussing and improving the Work, but
60
+ excluding communication that is conspicuously marked or otherwise
61
+ designated in writing by the copyright owner as "Not a Contribution."
62
+
63
+ "Contributor" shall mean Licensor and any individual or Legal Entity
64
+ on behalf of whom a Contribution has been received by Licensor and
65
+ subsequently incorporated within the Work.
66
+
67
+ 2. Grant of Copyright License. Subject to the terms and conditions of
68
+ this License, each Contributor hereby grants to You a perpetual,
69
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70
+ copyright license to reproduce, prepare Derivative Works of,
71
+ publicly display, publicly perform, sublicense, and distribute the
72
+ Work and such Derivative Works in Source or Object form.
73
+
74
+ 3. Grant of Patent License. Subject to the terms and conditions of
75
+ this License, each Contributor hereby grants to You a perpetual,
76
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77
+ (except as stated in this section) patent license to make, have made,
78
+ use, offer to sell, sell, import, and otherwise transfer the Work,
79
+ where such license applies only to those patent claims licensable
80
+ by such Contributor that are necessarily infringed by their
81
+ Contribution(s) alone or by combination of their Contribution(s)
82
+ with the Work to which such Contribution(s) was submitted. If You
83
+ institute patent litigation against any entity (including a
84
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
85
+ or a Contribution incorporated within the Work constitutes direct
86
+ or contributory patent infringement, then any patent licenses
87
+ granted to You under this License for that Work shall terminate
88
+ as of the date such litigation is filed.
89
+
90
+ 4. Redistribution. You may reproduce and distribute copies of the
91
+ Work or Derivative Works thereof in any medium, with or without
92
+ modifications, and in Source or Object form, provided that You
93
+ meet the following conditions:
94
+
95
+ (a) You must give any other recipients of the Work or
96
+ Derivative Works a copy of this License; and
97
+
98
+ (b) You must cause any modified files to carry prominent notices
99
+ stating that You changed the files; and
100
+
101
+ (c) You must retain, in the Source form of any Derivative Works
102
+ that You distribute, all copyright, patent, trademark, and
103
+ attribution notices from the Source form of the Work,
104
+ excluding those notices that do not pertain to any part of
105
+ the Derivative Works; and
106
+
107
+ (d) If the Work includes a "NOTICE" text file as part of its
108
+ distribution, then any Derivative Works that You distribute must
109
+ include a readable copy of the attribution notices contained
110
+ within such NOTICE file, excluding those notices that do not
111
+ pertain to any part of the Derivative Works, in at least one
112
+ of the following places: within a NOTICE text file distributed
113
+ as part of the Derivative Works; within the Source form or
114
+ documentation, if provided along with the Derivative Works; or,
115
+ within a display generated by the Derivative Works, if and
116
+ wherever such third-party notices normally appear. The contents
117
+ of the NOTICE file are for informational purposes only and
118
+ do not modify the License. You may add Your own attribution
119
+ notices within Derivative Works that You distribute, alongside
120
+ or as an addendum to the NOTICE text from the Work, provided
121
+ that such additional attribution notices cannot be construed
122
+ as modifying the License.
123
+
124
+ You may add Your own copyright statement to Your modifications and
125
+ may provide additional or different license terms and conditions
126
+ for use, reproduction, or distribution of Your modifications, or
127
+ for any such Derivative Works as a whole, provided Your use,
128
+ reproduction, and distribution of the Work otherwise complies with
129
+ the conditions stated in this License.
130
+
131
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
132
+ any Contribution intentionally submitted for inclusion in the Work
133
+ by You to the Licensor shall be under the terms and conditions of
134
+ this License, without any additional terms or conditions.
135
+ Notwithstanding the above, nothing herein shall supersede or modify
136
+ the terms of any separate license agreement you may have executed
137
+ with Licensor regarding such Contributions.
138
+
139
+ 6. Trademarks. This License does not grant permission to use the trade
140
+ names, trademarks, service marks, or product names of the Licensor,
141
+ except as required for reasonable and customary use in describing the
142
+ origin of the Work and reproducing the content of the NOTICE file.
143
+
144
+ 7. Disclaimer of Warranty. Unless required by applicable law or
145
+ agreed to in writing, Licensor provides the Work (and each
146
+ Contributor provides its Contributions) on an "AS IS" BASIS,
147
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148
+ implied, including, without limitation, any warranties or conditions
149
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150
+ PARTICULAR PURPOSE. You are solely responsible for determining the
151
+ appropriateness of using or redistributing the Work and assume any
152
+ risks associated with Your exercise of permissions under this License.
153
+
154
+ 8. Limitation of Liability. In no event and under no legal theory,
155
+ whether in tort (including negligence), contract, or otherwise,
156
+ unless required by applicable law (such as deliberate and grossly
157
+ negligent acts) or agreed to in writing, shall any Contributor be
158
+ liable to You for damages, including any direct, indirect, special,
159
+ incidental, or consequential damages of any character arising as a
160
+ result of this License or out of the use or inability to use the
161
+ Work (including but not limited to damages for loss of goodwill,
162
+ work stoppage, computer failure or malfunction, or any and all
163
+ other commercial damages or losses), even if such Contributor
164
+ has been advised of the possibility of such damages.
165
+
166
+ 9. Accepting Warranty or Additional Liability. While redistributing
167
+ the Work or Derivative Works thereof, You may choose to offer,
168
+ and charge a fee for, acceptance of support, warranty, indemnity,
169
+ or other liability obligations and/or rights consistent with this
170
+ License. However, in accepting such obligations, You may act only
171
+ on Your own behalf and on Your sole responsibility, not on behalf
172
+ of any other Contributor, and only if You agree to indemnify,
173
+ defend, and hold each Contributor harmless for any liability
174
+ incurred by, or claims asserted against, such Contributor by reason
175
+ of your accepting any such warranty or additional liability.
176
+
177
+ END OF TERMS AND CONDITIONS
178
+
179
+ APPENDIX: How to apply the Apache License to your work.
180
+
181
+ To apply the Apache License to your work, attach the following
182
+ boilerplate notice, with the fields enclosed by brackets "[]"
183
+ replaced with your own identifying information. (Don't include
184
+ the brackets!) The text should be enclosed in the appropriate
185
+ comment syntax for the file format. We also recommend that a
186
+ file or class name and description of purpose be included on the
187
+ same "printed page" as the copyright notice for easier
188
+ identification within third-party archives.
189
+
190
+ Copyright [yyyy] [name of copyright owner]
191
+
192
+ Licensed under the Apache License, Version 2.0 (the "License");
193
+ you may not use this file except in compliance with the License.
194
+ You may obtain a copy of the License at
195
+
196
+ http://www.apache.org/licenses/LICENSE-2.0
197
+
198
+ Unless required by applicable law or agreed to in writing, software
199
+ distributed under the License is distributed on an "AS IS" BASIS,
200
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201
+ See the License for the specific language governing permissions and
202
+ limitations under the License.
README.md ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Text To Speech
3
+ emoji: 🌍
4
+ colorFrom: yellow
5
+ colorTo: pink
6
+ sdk: gradio
7
+ sdk_version: 4.14.0
8
+ python_version: 3.8.9
9
+ app_file: app.py
10
+ pinned: false
11
+ license: apache-2.0
12
+ ---
13
+
14
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
app.py ADDED
@@ -0,0 +1,262 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ #
3
+ # Copyright 2022-2023 Xiaomi Corp. (authors: Fangjun Kuang)
4
+ #
5
+ # See LICENSE for clarification regarding multiple authors
6
+ #
7
+ # Licensed under the Apache License, Version 2.0 (the "License");
8
+ # you may not use this file except in compliance with the License.
9
+ # You may obtain a copy of the License at
10
+ #
11
+ # http://www.apache.org/licenses/LICENSE-2.0
12
+ #
13
+ # Unless required by applicable law or agreed to in writing, software
14
+ # distributed under the License is distributed on an "AS IS" BASIS,
15
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16
+ # See the License for the specific language governing permissions and
17
+ # limitations under the License.
18
+
19
+ # References:
20
+ # https://gradio.app/docs/#dropdown
21
+
22
+ import logging
23
+ import os
24
+ import time
25
+ import uuid
26
+
27
+ import gradio as gr
28
+ import soundfile as sf
29
+
30
+ from model import get_pretrained_model, language_to_models
31
+
32
+ title = "# Next-gen Kaldi: Text-to-speech (TTS)"
33
+
34
+ description = """
35
+ This space shows how to convert text to speech with Next-gen Kaldi.
36
+
37
+ It is running on CPU within a docker container provided by Hugging Face.
38
+
39
+ See more information by visiting the following links:
40
+
41
+ - <https://github.com/k2-fsa/sherpa-onnx>
42
+
43
+ If you want to deploy it locally, please see
44
+ <https://k2-fsa.github.io/sherpa/>
45
+
46
+ If you want to use Android APKs, please see
47
+ <https://k2-fsa.github.io/sherpa/onnx/tts/apk.html>
48
+
49
+ If you want to use Android text-to-speech engine APKs, please see
50
+ <https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html>
51
+
52
+ If you want to download an all-in-one exe for Windows, please see
53
+ <https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models>
54
+
55
+ """
56
+
57
+ # css style is copied from
58
+ # https://huggingface.co/spaces/alphacep/asr/blob/main/app.py#L113
59
+ css = """
60
+ .result {display:flex;flex-direction:column}
61
+ .result_item {padding:15px;margin-bottom:8px;border-radius:15px;width:100%}
62
+ .result_item_success {background-color:mediumaquamarine;color:white;align-self:start}
63
+ .result_item_error {background-color:#ff7070;color:white;align-self:start}
64
+ """
65
+
66
+ examples = [
67
+ [
68
+ "Chinese (Mandarin, 普通话)",
69
+ "csukuangfj/vits-zh-hf-fanchen-wnj|1",
70
+ "在一个阳光明媚的夏天,小马、小羊和小狗它们一块儿在广阔的草地上,嬉戏玩耍,这时小猴来了,还带着它心爱的足球活蹦乱跳地跑前、跑后教小马、小羊、小狗踢足球。",
71
+ 0,
72
+ 1.0,
73
+ ],
74
+ [
75
+ "Chinese (Mandarin, 普通话)",
76
+ "csukuangfj/vits-zh-hf-fanchen-C|187",
77
+ '小米的使命是,始终坚持做"感动人心、价格厚道"的好产品,让全球每个人都能享受科技带来的美好生活。',
78
+ 0,
79
+ 1.0,
80
+ ],
81
+ ["Min-nan (闽南话)", "csukuangfj/vits-mms-nan", "ài piaǸ chiah ē iaN̂", 0, 1.0],
82
+ ["Thai", "csukuangfj/vits-mms-tha", "ฉันรักคุณ", 0, 1.0],
83
+ [
84
+ "Chinese (Mandarin, 普通话)",
85
+ "csukuangfj/sherpa-onnx-vits-zh-ll|5",
86
+ "当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔。",
87
+ 2,
88
+ 1.0,
89
+ ],
90
+ ]
91
+
92
+
93
+ def update_model_dropdown(language: str):
94
+ if language in language_to_models:
95
+ choices = language_to_models[language]
96
+ return gr.Dropdown(
97
+ choices=choices,
98
+ value=choices[0],
99
+ interactive=True,
100
+ )
101
+
102
+ raise ValueError(f"Unsupported language: {language}")
103
+
104
+
105
+ def build_html_output(s: str, style: str = "result_item_success"):
106
+ return f"""
107
+ <div class='result'>
108
+ <div class='result_item {style}'>
109
+ {s}
110
+ </div>
111
+ </div>
112
+ """
113
+
114
+
115
+ def process(language: str, repo_id: str, text: str, sid: str, speed: float):
116
+ logging.info(f"Input text: {text}. sid: {sid}, speed: {speed}")
117
+ sid = int(sid)
118
+ tts = get_pretrained_model(repo_id, speed)
119
+
120
+ start = time.time()
121
+ audio = tts.generate(text, sid=sid)
122
+ end = time.time()
123
+
124
+ if len(audio.samples) == 0:
125
+ raise ValueError(
126
+ "Error in generating audios. Please read previous error messages."
127
+ )
128
+
129
+ duration = len(audio.samples) / audio.sample_rate
130
+
131
+ elapsed_seconds = end - start
132
+ rtf = elapsed_seconds / duration
133
+
134
+ info = f"""
135
+ Wave duration : {duration:.3f} s <br/>
136
+ Processing time: {elapsed_seconds:.3f} s <br/>
137
+ RTF: {elapsed_seconds:.3f}/{duration:.3f} = {rtf:.3f} <br/>
138
+ """
139
+
140
+ logging.info(info)
141
+ logging.info(f"\nrepo_id: {repo_id}\ntext: {text}\nsid: {sid}\nspeed: {speed}")
142
+
143
+ filename = str(uuid.uuid4())
144
+ filename = f"{filename}.wav"
145
+ sf.write(
146
+ filename,
147
+ audio.samples,
148
+ samplerate=audio.sample_rate,
149
+ subtype="PCM_16",
150
+ )
151
+
152
+ return filename, build_html_output(info)
153
+
154
+
155
+ demo = gr.Blocks(css=css)
156
+
157
+
158
+ with demo:
159
+ gr.Markdown(title)
160
+ language_choices = list(language_to_models.keys())
161
+
162
+ language_radio = gr.Radio(
163
+ label="Language",
164
+ choices=language_choices,
165
+ value=language_choices[0],
166
+ )
167
+
168
+ model_dropdown = gr.Dropdown(
169
+ choices=language_to_models[language_choices[0]],
170
+ label="Select a model",
171
+ value=language_to_models[language_choices[0]][0],
172
+ )
173
+
174
+ language_radio.change(
175
+ update_model_dropdown,
176
+ inputs=language_radio,
177
+ outputs=model_dropdown,
178
+ )
179
+
180
+ with gr.Tabs():
181
+ with gr.TabItem("Please input your text"):
182
+ input_text = gr.Textbox(
183
+ label="Input text",
184
+ info="Your text",
185
+ lines=3,
186
+ placeholder="Please input your text here",
187
+ )
188
+
189
+ input_sid = gr.Textbox(
190
+ label="Speaker ID",
191
+ info="Speaker ID",
192
+ lines=1,
193
+ max_lines=1,
194
+ value="0",
195
+ placeholder="Speaker ID. Valid only for mult-speaker model",
196
+ )
197
+
198
+ input_speed = gr.Slider(
199
+ minimum=0.1,
200
+ maximum=10,
201
+ value=1,
202
+ step=0.1,
203
+ label="Speed (larger->faster; smaller->slower)",
204
+ )
205
+
206
+ input_button = gr.Button("Submit")
207
+
208
+ output_audio = gr.Audio(label="Output")
209
+
210
+ output_info = gr.HTML(label="Info")
211
+
212
+ gr.Examples(
213
+ examples=examples,
214
+ fn=process,
215
+ inputs=[
216
+ language_radio,
217
+ model_dropdown,
218
+ input_text,
219
+ input_sid,
220
+ input_speed,
221
+ ],
222
+ outputs=[
223
+ output_audio,
224
+ output_info,
225
+ ],
226
+ )
227
+
228
+ input_button.click(
229
+ process,
230
+ inputs=[
231
+ language_radio,
232
+ model_dropdown,
233
+ input_text,
234
+ input_sid,
235
+ input_speed,
236
+ ],
237
+ outputs=[
238
+ output_audio,
239
+ output_info,
240
+ ],
241
+ )
242
+
243
+ gr.Markdown(description)
244
+
245
+
246
+ def download_espeak_ng_data():
247
+ os.system(
248
+ """
249
+ cd /tmp
250
+ wget -qq https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/espeak-ng-data.tar.bz2
251
+ tar xf espeak-ng-data.tar.bz2
252
+ """
253
+ )
254
+
255
+
256
+ if __name__ == "__main__":
257
+ download_espeak_ng_data()
258
+ formatter = "%(asctime)s %(levelname)s [%(filename)s:%(lineno)d] %(message)s"
259
+
260
+ logging.basicConfig(format=formatter, level=logging.INFO)
261
+
262
+ demo.launch()
gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
model.py ADDED
@@ -0,0 +1,799 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2022-2023 Xiaomi Corp. (authors: Fangjun Kuang)
2
+ #
3
+ # See LICENSE for clarification regarding multiple authors
4
+ #
5
+ # Licensed under the Apache License, Version 2.0 (the "License");
6
+ # you may not use this file except in compliance with the License.
7
+ # You may obtain a copy of the License at
8
+ #
9
+ # http://www.apache.org/licenses/LICENSE-2.0
10
+ #
11
+ # Unless required by applicable law or agreed to in writing, software
12
+ # distributed under the License is distributed on an "AS IS" BASIS,
13
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ # See the License for the specific language governing permissions and
15
+ # limitations under the License.
16
+
17
+ import os
18
+ from functools import lru_cache
19
+ from pathlib import Path
20
+
21
+ import sherpa_onnx
22
+ from huggingface_hub import hf_hub_download
23
+
24
+
25
+ def get_file(
26
+ repo_id: str,
27
+ filename: str,
28
+ subfolder: str = ".",
29
+ ) -> str:
30
+ model_filename = hf_hub_download(
31
+ repo_id=repo_id,
32
+ filename=filename,
33
+ subfolder=subfolder,
34
+ )
35
+ return model_filename
36
+
37
+
38
+ @lru_cache(maxsize=10)
39
+ def _get_vits_vctk(repo_id: str, speed: float) -> sherpa_onnx.OfflineTts:
40
+ assert repo_id == "csukuangfj/vits-vctk"
41
+
42
+ model = get_file(
43
+ repo_id=repo_id,
44
+ filename="vits-vctk.onnx",
45
+ subfolder=".",
46
+ )
47
+
48
+ lexicon = get_file(
49
+ repo_id=repo_id,
50
+ filename="lexicon.txt",
51
+ subfolder=".",
52
+ )
53
+
54
+ tokens = get_file(
55
+ repo_id=repo_id,
56
+ filename="tokens.txt",
57
+ subfolder=".",
58
+ )
59
+
60
+ tts_config = sherpa_onnx.OfflineTtsConfig(
61
+ model=sherpa_onnx.OfflineTtsModelConfig(
62
+ vits=sherpa_onnx.OfflineTtsVitsModelConfig(
63
+ model=model,
64
+ lexicon=lexicon,
65
+ tokens=tokens,
66
+ length_scale=1.0 / speed,
67
+ ),
68
+ provider="cpu",
69
+ debug=True,
70
+ num_threads=2,
71
+ )
72
+ )
73
+ tts = sherpa_onnx.OfflineTts(tts_config)
74
+
75
+ return tts
76
+
77
+
78
+ @lru_cache(maxsize=10)
79
+ def _get_vits_ljs(repo_id: str, speed: float) -> sherpa_onnx.OfflineTts:
80
+ assert repo_id == "csukuangfj/vits-ljs"
81
+
82
+ model = get_file(
83
+ repo_id=repo_id,
84
+ filename="vits-ljs.onnx",
85
+ subfolder=".",
86
+ )
87
+
88
+ lexicon = get_file(
89
+ repo_id=repo_id,
90
+ filename="lexicon.txt",
91
+ subfolder=".",
92
+ )
93
+
94
+ tokens = get_file(
95
+ repo_id=repo_id,
96
+ filename="tokens.txt",
97
+ subfolder=".",
98
+ )
99
+
100
+ tts_config = sherpa_onnx.OfflineTtsConfig(
101
+ model=sherpa_onnx.OfflineTtsModelConfig(
102
+ vits=sherpa_onnx.OfflineTtsVitsModelConfig(
103
+ model=model,
104
+ lexicon=lexicon,
105
+ tokens=tokens,
106
+ length_scale=1.0 / speed,
107
+ ),
108
+ provider="cpu",
109
+ debug=True,
110
+ num_threads=2,
111
+ )
112
+ )
113
+ tts = sherpa_onnx.OfflineTts(tts_config)
114
+
115
+ return tts
116
+
117
+
118
+ @lru_cache(maxsize=10)
119
+ def _get_vits_piper(repo_id: str, speed: float) -> sherpa_onnx.OfflineTts:
120
+ data_dir = "/tmp/espeak-ng-data"
121
+ if "coqui" in repo_id or "vits-mms" in repo_id:
122
+ name = "model"
123
+ elif "piper" in repo_id:
124
+ n = len("vits-piper-")
125
+ name = repo_id.split("/")[1][n:]
126
+ elif "mimic3" in repo_id:
127
+ n = len("vits-mimic3-")
128
+ name = repo_id.split("/")[1][n:]
129
+ else:
130
+ raise ValueError(f"Unsupported {repo_id}")
131
+
132
+ if "vits-coqui-uk-mai" in repo_id or "vits-mms" in repo_id:
133
+ data_dir = ""
134
+
135
+ model = get_file(
136
+ repo_id=repo_id,
137
+ filename=f"{name}.onnx",
138
+ subfolder=".",
139
+ )
140
+
141
+ tokens = get_file(
142
+ repo_id=repo_id,
143
+ filename="tokens.txt",
144
+ subfolder=".",
145
+ )
146
+
147
+ tts_config = sherpa_onnx.OfflineTtsConfig(
148
+ model=sherpa_onnx.OfflineTtsModelConfig(
149
+ vits=sherpa_onnx.OfflineTtsVitsModelConfig(
150
+ model=model,
151
+ lexicon="",
152
+ data_dir=data_dir,
153
+ tokens=tokens,
154
+ length_scale=1.0 / speed,
155
+ ),
156
+ provider="cpu",
157
+ debug=True,
158
+ num_threads=2,
159
+ )
160
+ )
161
+ tts = sherpa_onnx.OfflineTts(tts_config)
162
+
163
+ return tts
164
+
165
+
166
+ @lru_cache(maxsize=10)
167
+ def _get_vits_mms(repo_id: str, speed: float) -> sherpa_onnx.OfflineTts:
168
+ return _get_vits_piper(repo_id, speed)
169
+
170
+
171
+ @lru_cache(maxsize=10)
172
+ def _get_vits_zh_aishell3(repo_id: str, speed: float) -> sherpa_onnx.OfflineTts:
173
+ assert repo_id == "csukuangfj/vits-zh-aishell3"
174
+
175
+ model = get_file(
176
+ repo_id=repo_id,
177
+ filename="vits-aishell3.onnx",
178
+ subfolder=".",
179
+ )
180
+
181
+ lexicon = get_file(
182
+ repo_id=repo_id,
183
+ filename="lexicon.txt",
184
+ subfolder=".",
185
+ )
186
+
187
+ tokens = get_file(
188
+ repo_id=repo_id,
189
+ filename="tokens.txt",
190
+ subfolder=".",
191
+ )
192
+
193
+ rule_fsts = ["phone.fst", "date.fst", "number.fst", "new_heteronym.fst"]
194
+
195
+ rule_fsts = [
196
+ get_file(
197
+ repo_id=repo_id,
198
+ filename=f,
199
+ subfolder=".",
200
+ )
201
+ for f in rule_fsts
202
+ ]
203
+ rule_fsts = ",".join(rule_fsts)
204
+
205
+ rule_fars = get_file(
206
+ repo_id=repo_id,
207
+ filename="rule.far",
208
+ subfolder=".",
209
+ )
210
+
211
+ tts_config = sherpa_onnx.OfflineTtsConfig(
212
+ model=sherpa_onnx.OfflineTtsModelConfig(
213
+ vits=sherpa_onnx.OfflineTtsVitsModelConfig(
214
+ model=model,
215
+ lexicon=lexicon,
216
+ tokens=tokens,
217
+ length_scale=1.0 / speed,
218
+ ),
219
+ provider="cpu",
220
+ debug=True,
221
+ num_threads=2,
222
+ ),
223
+ rule_fsts=rule_fsts,
224
+ rule_fars=rule_fars,
225
+ )
226
+ tts = sherpa_onnx.OfflineTts(tts_config)
227
+
228
+ return tts
229
+
230
+
231
+ @lru_cache(maxsize=10)
232
+ def _get_vits_hf(repo_id: str, speed: float) -> sherpa_onnx.OfflineTts:
233
+ repo_id = repo_id.split("|")[0]
234
+
235
+ if "fanchen" in repo_id or "vits-cantonese-hf-xiaomaiiwn" in repo_id:
236
+ model = repo_id.split("/")[-1]
237
+ else:
238
+ model = repo_id.split("-")[-1]
239
+
240
+ if "sherpa-onnx-vits-zh-ll" in repo_id:
241
+ model = "model"
242
+
243
+ if not Path("/tmp/dict").is_dir():
244
+ os.system(
245
+ "cd /tmp; curl -SL -O https://github.com/csukuangfj/cppjieba/releases/download/sherpa-onnx-2024-04-19/dict.tar.bz2; tar xvf dict.tar.bz2"
246
+ )
247
+ os.system("ls -lh /tmp/dict")
248
+
249
+ model = get_file(
250
+ repo_id=repo_id,
251
+ filename=f"{model}.onnx",
252
+ subfolder=".",
253
+ )
254
+
255
+ lexicon = get_file(
256
+ repo_id=repo_id,
257
+ filename="lexicon.txt",
258
+ subfolder=".",
259
+ )
260
+
261
+ tokens = get_file(
262
+ repo_id=repo_id,
263
+ filename="tokens.txt",
264
+ subfolder=".",
265
+ )
266
+
267
+ rule_fars = ""
268
+
269
+ if "vits-cantonese-hf-xiaomaiiwn" not in repo_id:
270
+ rule_fsts = ["phone.fst", "date.fst", "number.fst", "new_heteronym.fst"]
271
+
272
+ rule_fsts = [
273
+ get_file(
274
+ repo_id=repo_id,
275
+ filename=f,
276
+ subfolder=".",
277
+ )
278
+ for f in rule_fsts
279
+ ]
280
+ rule_fsts = ",".join(rule_fsts)
281
+
282
+ # rule_fars = get_file(
283
+ # repo_id=repo_id,
284
+ # filename="rule.far",
285
+ # subfolder=".",
286
+ # )
287
+ vits_dict_dir = "/tmp/dict"
288
+ else:
289
+ rule_fsts = get_file(
290
+ repo_id=repo_id,
291
+ filename="rule.fst",
292
+ subfolder=".",
293
+ )
294
+ vits_dict_dir = ""
295
+
296
+ tts_config = sherpa_onnx.OfflineTtsConfig(
297
+ model=sherpa_onnx.OfflineTtsModelConfig(
298
+ vits=sherpa_onnx.OfflineTtsVitsModelConfig(
299
+ model=model,
300
+ lexicon=lexicon,
301
+ tokens=tokens,
302
+ dict_dir=vits_dict_dir,
303
+ length_scale=1.0 / speed,
304
+ ),
305
+ provider="cpu",
306
+ debug=True,
307
+ num_threads=2,
308
+ ),
309
+ rule_fsts=rule_fsts,
310
+ rule_fars=rule_fars,
311
+ )
312
+ tts = sherpa_onnx.OfflineTts(tts_config)
313
+
314
+ return tts
315
+
316
+
317
+ @lru_cache(maxsize=10)
318
+ def get_pretrained_model(repo_id: str, speed: float) -> sherpa_onnx.OfflineTts:
319
+ if repo_id in chinese_models:
320
+ return chinese_models[repo_id](repo_id, speed)
321
+ if repo_id in cantonese_models:
322
+ return cantonese_models[repo_id](repo_id, speed)
323
+ elif repo_id in english_models:
324
+ return english_models[repo_id](repo_id, speed)
325
+ elif repo_id in german_models:
326
+ return german_models[repo_id](repo_id, speed)
327
+ elif repo_id in spanish_models:
328
+ return spanish_models[repo_id](repo_id, speed)
329
+ elif repo_id in french_models:
330
+ return french_models[repo_id](repo_id, speed)
331
+ elif repo_id in ukrainian_models:
332
+ return ukrainian_models[repo_id](repo_id, speed)
333
+ elif repo_id in russian_models:
334
+ return russian_models[repo_id](repo_id, speed)
335
+ elif repo_id in arabic_models:
336
+ return arabic_models[repo_id](repo_id, speed)
337
+ elif repo_id in catalan_models:
338
+ return catalan_models[repo_id](repo_id, speed)
339
+ elif repo_id in czech_models:
340
+ return czech_models[repo_id](repo_id, speed)
341
+ elif repo_id in danish_models:
342
+ return danish_models[repo_id](repo_id, speed)
343
+ elif repo_id in greek_models:
344
+ return greek_models[repo_id](repo_id, speed)
345
+ elif repo_id in finnish_models:
346
+ return finnish_models[repo_id](repo_id, speed)
347
+ elif repo_id in hungarian_models:
348
+ return hungarian_models[repo_id](repo_id, speed)
349
+ elif repo_id in icelandic_models:
350
+ return icelandic_models[repo_id](repo_id, speed)
351
+ elif repo_id in italian_models:
352
+ return italian_models[repo_id](repo_id, speed)
353
+ elif repo_id in georgian_models:
354
+ return georgian_models[repo_id](repo_id, speed)
355
+ elif repo_id in kazakh_models:
356
+ return kazakh_models[repo_id](repo_id, speed)
357
+ elif repo_id in luxembourgish_models:
358
+ return luxembourgish_models[repo_id](repo_id, speed)
359
+ elif repo_id in nepali_models:
360
+ return nepali_models[repo_id](repo_id, speed)
361
+ elif repo_id in dutch_models:
362
+ return dutch_models[repo_id](repo_id, speed)
363
+ elif repo_id in norwegian_models:
364
+ return norwegian_models[repo_id](repo_id, speed)
365
+ elif repo_id in polish_models:
366
+ return polish_models[repo_id](repo_id, speed)
367
+ elif repo_id in portuguese_models:
368
+ return portuguese_models[repo_id](repo_id, speed)
369
+ elif repo_id in romanian_models:
370
+ return romanian_models[repo_id](repo_id, speed)
371
+ elif repo_id in slovak_models:
372
+ return slovak_models[repo_id](repo_id, speed)
373
+ elif repo_id in serbian_models:
374
+ return serbian_models[repo_id](repo_id, speed)
375
+ elif repo_id in swedish_models:
376
+ return swedish_models[repo_id](repo_id, speed)
377
+ elif repo_id in swahili_models:
378
+ return swahili_models[repo_id](repo_id, speed)
379
+ elif repo_id in turkish_models:
380
+ return turkish_models[repo_id](repo_id, speed)
381
+ elif repo_id in vietnamese_models:
382
+ return vietnamese_models[repo_id](repo_id, speed)
383
+ elif repo_id in bulgarian_models:
384
+ return bulgarian_models[repo_id](repo_id, speed)
385
+ elif repo_id in estonian_models:
386
+ return estonian_models[repo_id](repo_id, speed)
387
+ elif repo_id in irish_models:
388
+ return irish_models[repo_id](repo_id, speed)
389
+ elif repo_id in croatian_models:
390
+ return croatian_models[repo_id](repo_id, speed)
391
+ elif repo_id in lithuanian_models:
392
+ return lithuanian_models[repo_id](repo_id, speed)
393
+ elif repo_id in latvian_models:
394
+ return latvian_models[repo_id](repo_id, speed)
395
+ elif repo_id in maltese_models:
396
+ return maltese_models[repo_id](repo_id, speed)
397
+ elif repo_id in slovenian_models:
398
+ return slovenian_models[repo_id](repo_id, speed)
399
+ elif repo_id in bengali_models:
400
+ return bengali_models[repo_id](repo_id, speed)
401
+ elif repo_id in min_nan_models:
402
+ return min_nan_models[repo_id](repo_id, speed)
403
+ elif repo_id in thai_models:
404
+ return thai_models[repo_id](repo_id, speed)
405
+ elif repo_id in persian_models:
406
+ return persian_models[repo_id](repo_id, speed)
407
+ elif repo_id in korean_models:
408
+ return korean_models[repo_id](repo_id, speed)
409
+ elif repo_id in afrikaans_models:
410
+ return afrikaans_models[repo_id](repo_id, speed)
411
+ elif repo_id in gujarati_models:
412
+ return gujarati_models[repo_id](repo_id, speed)
413
+ elif repo_id in tswana_models:
414
+ return tswana_models[repo_id](repo_id, speed)
415
+ else:
416
+ raise ValueError(f"Unsupported repo_id: {repo_id}")
417
+
418
+
419
+ cantonese_models = {
420
+ "csukuangfj/vits-cantonese-hf-xiaomaiiwn": _get_vits_hf,
421
+ }
422
+
423
+ chinese_models = {
424
+ "csukuangfj/vits-zh-hf-fanchen-wnj|1": _get_vits_hf, # 1
425
+ "csukuangfj/vits-zh-hf-fanchen-C|187": _get_vits_hf, # 187
426
+ "csukuangfj/sherpa-onnx-vits-zh-ll|5": _get_vits_hf, # 804
427
+ "csukuangfj/vits-zh-hf-keqing|804": _get_vits_hf, # 804
428
+ "csukuangfj/vits-zh-hf-theresa|804": _get_vits_hf, # 804
429
+ "csukuangfj/vits-zh-hf-eula|804": _get_vits_hf, # 804
430
+ "csukuangfj/vits-zh-hf-echo|804": _get_vits_hf, # 804
431
+ "csukuangfj/vits-zh-hf-bronya|804": _get_vits_hf, # 804
432
+ "csukuangfj/vits-zh-hf-doom|804": _get_vits_hf, # 804
433
+ "csukuangfj/vits-zh-hf-zenyatta|804": _get_vits_hf, # 804
434
+ "csukuangfj/vits-zh-hf-abyssinvoker|804": _get_vits_hf, # 804
435
+ "csukuangfj/vits-zh-hf-fanchen-ZhiHuiLaoZhe|1": _get_vits_hf, # 1
436
+ "csukuangfj/vits-zh-hf-fanchen-ZhiHuiLaoZhe_new|1": _get_vits_hf, # 1
437
+ "csukuangfj/vits-zh-hf-fanchen-unity|1": _get_vits_hf, # 1
438
+ "csukuangfj/vits-zh-aishell3": _get_vits_zh_aishell3,
439
+ "csukuangfj/vits-piper-zh_CN-huayan-medium": _get_vits_piper,
440
+ # "csukuangfj/vits-piper-zh_CN-huayan-x_low": _get_vits_piper,
441
+ }
442
+
443
+ english_models = {
444
+ "csukuangfj/vits-piper-en_US-glados": _get_vits_piper,
445
+ # coqui-ai
446
+ "csukuangfj/vits-coqui-en-ljspeech": _get_vits_piper,
447
+ "csukuangfj/vits-coqui-en-ljspeech-neon": _get_vits_piper,
448
+ "csukuangfj/vits-coqui-en-vctk": _get_vits_piper,
449
+ # piper, US
450
+ "csukuangfj/vits-piper-en_GB-sweetbbak-amy": _get_vits_piper,
451
+ "csukuangfj/vits-piper-en_US-amy-low": _get_vits_piper,
452
+ "csukuangfj/vits-piper-en_US-amy-medium": _get_vits_piper,
453
+ "csukuangfj/vits-piper-en_US-arctic-medium": _get_vits_piper, # 18 speakers
454
+ "csukuangfj/vits-piper-en_US-danny-low": _get_vits_piper,
455
+ "csukuangfj/vits-piper-en_US-hfc_male-medium": _get_vits_piper,
456
+ "csukuangfj/vits-piper-en_US-joe-medium": _get_vits_piper,
457
+ "csukuangfj/vits-piper-en_US-kathleen-low": _get_vits_piper,
458
+ "csukuangfj/vits-piper-en_US-kusal-medium": _get_vits_piper,
459
+ "csukuangfj/vits-piper-en_US-l2arctic-medium": _get_vits_piper, # 24 speakers
460
+ "csukuangfj/vits-piper-en_US-lessac-high": _get_vits_piper,
461
+ "csukuangfj/vits-piper-en_US-lessac-low": _get_vits_piper,
462
+ "csukuangfj/vits-piper-en_US-lessac-medium": _get_vits_piper,
463
+ "csukuangfj/vits-piper-en_US-libritts-high": _get_vits_piper, # 904 speakers
464
+ "csukuangfj/vits-piper-en_US-libritts_r-medium": _get_vits_piper, # 904 speakers
465
+ "csukuangfj/vits-piper-en_US-ljspeech-high": _get_vits_piper,
466
+ "csukuangfj/vits-piper-en_US-ljspeech-medium": _get_vits_piper,
467
+ "csukuangfj/vits-piper-en_US-ryan-high": _get_vits_piper,
468
+ "csukuangfj/vits-piper-en_US-ryan-low": _get_vits_piper,
469
+ "csukuangfj/vits-piper-en_US-ryan-medium": _get_vits_piper,
470
+ # piper, GB
471
+ "csukuangfj/vits-piper-en_GB-alan-low": _get_vits_piper,
472
+ "csukuangfj/vits-piper-en_GB-alan-medium": _get_vits_piper,
473
+ "csukuangfj/vits-piper-en_GB-alan-medium": _get_vits_piper,
474
+ "csukuangfj/vits-piper-en_GB-cori-high": _get_vits_piper,
475
+ "csukuangfj/vits-piper-en_GB-cori-medium": _get_vits_piper,
476
+ "csukuangfj/vits-piper-en_GB-jenny_dioco-medium": _get_vits_piper,
477
+ "csukuangfj/vits-piper-en_GB-northern_english_male-medium": _get_vits_piper,
478
+ "csukuangfj/vits-piper-en_GB-semaine-medium": _get_vits_piper,
479
+ "csukuangfj/vits-piper-en_GB-southern_english_female-low": _get_vits_piper,
480
+ "csukuangfj/vits-piper-en_GB-vctk-medium": _get_vits_piper,
481
+ #
482
+ "csukuangfj/vits-vctk": _get_vits_vctk, # 109 speakers
483
+ "csukuangfj/vits-ljs": _get_vits_ljs,
484
+ }
485
+
486
+ german_models = {
487
+ "csukuangfj/vits-coqui-de-css10": _get_vits_piper,
488
+ "csukuangfj/vits-piper-de_DE-eva_k-x_low": _get_vits_piper,
489
+ "csukuangfj/vits-piper-de_DE-karlsson-low": _get_vits_piper,
490
+ "csukuangfj/vits-piper-de_DE-kerstin-low": _get_vits_piper,
491
+ # "csukuangfj/vits-piper-de_DE-mls-medium": _get_vits_piper,
492
+ "csukuangfj/vits-piper-de_DE-pavoque-low": _get_vits_piper,
493
+ "csukuangfj/vits-piper-de_DE-ramona-low": _get_vits_piper,
494
+ "csukuangfj/vits-piper-de_DE-thorsten-low": _get_vits_piper,
495
+ "csukuangfj/vits-piper-de_DE-thorsten-medium": _get_vits_piper,
496
+ "csukuangfj/vits-piper-de_DE-thorsten-high": _get_vits_piper,
497
+ "csukuangfj/vits-piper-de_DE-thorsten_emotional-medium": _get_vits_piper, # 8 speakers
498
+ }
499
+
500
+ spanish_models = {
501
+ # "csukuangfj/vits-coqui-es-css10": _get_vits_piper,
502
+ "csukuangfj/vits-piper-es-glados-medium": _get_vits_piper,
503
+ "csukuangfj/vits-piper-es_ES-carlfm-x_low": _get_vits_piper,
504
+ "csukuangfj/vits-piper-es_ES-davefx-medium": _get_vits_piper,
505
+ # "csukuangfj/vits-piper-es_ES-mls_10246-low": _get_vits_piper,
506
+ # "csukuangfj/vits-piper-es_ES-mls_9972-low": _get_vits_piper,
507
+ "csukuangfj/vits-piper-es_ES-sharvard-medium": _get_vits_piper, # 2 speakers
508
+ "csukuangfj/vits-piper-es_MX-ald-medium": _get_vits_piper,
509
+ "csukuangfj/vits-piper-es_MX-claude-high": _get_vits_piper,
510
+ "csukuangfj/vits-mimic3-es_ES-m-ailabs_low": _get_vits_piper,
511
+ }
512
+
513
+ french_models = {
514
+ "csukuangfj/vits-coqui-fr-css10": _get_vits_piper,
515
+ # "csukuangfj/vits-piper-fr_FR-gilles-low": _get_vits_piper,
516
+ # "csukuangfj/vits-piper-fr_FR-mls_1840-low": _get_vits_piper,
517
+ "csukuangfj/vits-piper-fr_FR-mls-medium": _get_vits_piper, # 2 speakers, 0-femal, 1-male
518
+ "csukuangfj/vits-piper-fr_FR-upmc-medium": _get_vits_piper, # 2 speakers, 0-femal, 1-male
519
+ "csukuangfj/vits-piper-fr_FR-siwis-low": _get_vits_piper, # female
520
+ "csukuangfj/vits-piper-fr_FR-siwis-medium": _get_vits_piper,
521
+ "csukuangfj/vits-piper-fr_FR-tjiho-model1": _get_vits_piper,
522
+ "csukuangfj/vits-piper-fr_FR-tjiho-model2": _get_vits_piper,
523
+ "csukuangfj/vits-piper-fr_FR-tjiho-model3": _get_vits_piper,
524
+ }
525
+
526
+ ukrainian_models = {
527
+ "csukuangfj/vits-piper-uk_UA-lada-x_low": _get_vits_piper,
528
+ "csukuangfj/vits-coqui-uk-mai": _get_vits_piper,
529
+ # "csukuangfj/vits-piper-uk_UA-ukrainian_tts-medium": _get_vits_piper, # does not work somehow
530
+ }
531
+
532
+ russian_models = {
533
+ "csukuangfj/vits-piper-ru_RU-denis-medium": _get_vits_piper,
534
+ "csukuangfj/vits-piper-ru_RU-dmitri-medium": _get_vits_piper,
535
+ "csukuangfj/vits-piper-ru_RU-irina-medium": _get_vits_piper,
536
+ "csukuangfj/vits-piper-ru_RU-ruslan-medium": _get_vits_piper,
537
+ }
538
+
539
+ arabic_models = {
540
+ "csukuangfj/vits-piper-ar_JO-kareem-low": _get_vits_piper,
541
+ "csukuangfj/vits-piper-ar_JO-kareem-medium": _get_vits_piper,
542
+ }
543
+
544
+ catalan_models = {
545
+ "csukuangfj/vits-piper-ca_ES-upc_ona-x_low": _get_vits_piper,
546
+ "csukuangfj/vits-piper-ca_ES-upc_ona-medium": _get_vits_piper,
547
+ "csukuangfj/vits-piper-ca_ES-upc_pau-x_low": _get_vits_piper,
548
+ }
549
+
550
+ czech_models = {
551
+ "csukuangfj/vits-piper-cs_CZ-jirka-low": _get_vits_piper,
552
+ "csukuangfj/vits-piper-cs_CZ-jirka-medium": _get_vits_piper,
553
+ "csukuangfj/vits-coqui-cs-cv": _get_vits_piper,
554
+ }
555
+
556
+ danish_models = {
557
+ "csukuangfj/vits-coqui-da-cv": _get_vits_piper,
558
+ "csukuangfj/vits-piper-da_DK-talesyntese-medium": _get_vits_piper,
559
+ }
560
+
561
+ greek_models = {
562
+ "csukuangfj/vits-piper-el_GR-rapunzelina-low": _get_vits_piper,
563
+ # "csukuangfj/vits-mimic3-el_GR-rapunzelina_low": _get_vits_piper,
564
+ }
565
+
566
+ finnish_models = {
567
+ "csukuangfj/vits-coqui-fi-css10": _get_vits_piper,
568
+ "csukuangfj/vits-piper-fi_FI-harri-low": _get_vits_piper,
569
+ "csukuangfj/vits-piper-fi_FI-harri-medium": _get_vits_piper,
570
+ "csukuangfj/vits-mimic3-fi_FI-harri-tapani-ylilammi_low": _get_vits_piper,
571
+ }
572
+
573
+ hungarian_models = {
574
+ # "csukuangfj/vits-coqui-hu-css10": _get_vits_piper,
575
+ "csukuangfj/vits-piper-hu_HU-anna-medium": _get_vits_piper,
576
+ "csukuangfj/vits-piper-hu_HU-berta-medium": _get_vits_piper,
577
+ "csukuangfj/vits-piper-hu_HU-imre-medium": _get_vits_piper,
578
+ "csukuangfj/vits-mimic3-hu_HU-diana-majlinger_low": _get_vits_piper,
579
+ }
580
+
581
+ icelandic_models = {
582
+ "csukuangfj/vits-piper-is_IS-bui-medium": _get_vits_piper,
583
+ "csukuangfj/vits-piper-is_IS-salka-medium": _get_vits_piper,
584
+ "csukuangfj/vits-piper-is_IS-steinn-medium": _get_vits_piper,
585
+ "csukuangfj/vits-piper-is_IS-ugla-medium": _get_vits_piper,
586
+ }
587
+
588
+ italian_models = {
589
+ "csukuangfj/vits-piper-it_IT-riccardo-x_low": _get_vits_piper,
590
+ }
591
+
592
+ georgian_models = {
593
+ "csukuangfj/vits-piper-ka_GE-natia-medium": _get_vits_piper,
594
+ }
595
+
596
+ kazakh_models = {
597
+ "csukuangfj/vits-piper-kk_KZ-iseke-x_low": _get_vits_piper,
598
+ "csukuangfj/vits-piper-kk_KZ-issai-high": _get_vits_piper,
599
+ "csukuangfj/vits-piper-kk_KZ-raya-x_low": _get_vits_piper,
600
+ }
601
+
602
+ luxembourgish_models = {
603
+ "csukuangfj/vits-piper-lb_LU-marylux-medium": _get_vits_piper,
604
+ }
605
+
606
+ nepali_models = {
607
+ "csukuangfj/vits-piper-ne_NP-google-medium": _get_vits_piper,
608
+ "csukuangfj/vits-piper-ne_NP-google-x_low": _get_vits_piper,
609
+ "csukuangfj/vits-mimic3-ne_NP-ne-google_low": _get_vits_piper,
610
+ }
611
+
612
+ dutch_models = {
613
+ "csukuangfj/vits-coqui-nl-css10": _get_vits_piper,
614
+ "csukuangfj/vits-piper-nl_BE-nathalie-medium": _get_vits_piper,
615
+ "csukuangfj/vits-piper-nl_BE-nathalie-x_low": _get_vits_piper,
616
+ "csukuangfj/vits-piper-nl_BE-rdh-medium": _get_vits_piper,
617
+ "csukuangfj/vits-piper-nl_BE-rdh-x_low": _get_vits_piper,
618
+ "csukuangfj/vits-piper-nl_NL-mls-medium": _get_vits_piper,
619
+ "csukuangfj/vits-piper-nl_NL-mls_5809-low": _get_vits_piper,
620
+ "csukuangfj/vits-piper-nl_NL-mls_7432-low": _get_vits_piper,
621
+ }
622
+
623
+ norwegian_models = {
624
+ "csukuangfj/vits-piper-no_NO-talesyntese-medium": _get_vits_piper,
625
+ }
626
+
627
+ polish_models = {
628
+ "csukuangfj/vits-coqui-pl-mai_female": _get_vits_piper,
629
+ "csukuangfj/vits-piper-pl_PL-darkman-medium": _get_vits_piper,
630
+ "csukuangfj/vits-piper-pl_PL-gosia-medium": _get_vits_piper,
631
+ "csukuangfj/vits-piper-pl_PL-mc_speech-medium": _get_vits_piper,
632
+ # "csukuangfj/vits-piper-pl_PL-mls_6892-low": _get_vits_piper,
633
+ "csukuangfj/vits-mimic3-pl_PL-m-ailabs_low": _get_vits_piper,
634
+ }
635
+
636
+ portuguese_models = {
637
+ "csukuangfj/vits-coqui-pt-cv": _get_vits_piper,
638
+ "csukuangfj/vits-piper-pt_BR-edresson-low": _get_vits_piper,
639
+ "csukuangfj/vits-piper-pt_BR-faber-medium": _get_vits_piper,
640
+ "csukuangfj/vits-piper-pt_PT-tugao-medium": _get_vits_piper,
641
+ }
642
+
643
+ romanian_models = {
644
+ "csukuangfj/vits-coqui-ro-cv": _get_vits_piper,
645
+ "csukuangfj/vits-piper-ro_RO-mihai-medium": _get_vits_piper,
646
+ }
647
+
648
+
649
+ slovak_models = {
650
+ "csukuangfj/vits-coqui-sk-cv": _get_vits_piper,
651
+ "csukuangfj/vits-piper-sk_SK-lili-medium": _get_vits_piper,
652
+ }
653
+
654
+ serbian_models = {
655
+ "csukuangfj/vits-piper-sr_RS-serbski_institut-medium": _get_vits_piper,
656
+ }
657
+
658
+ swedish_models = {
659
+ "csukuangfj/vits-coqui-sv-cv": _get_vits_piper,
660
+ "csukuangfj/vits-piper-sv_SE-nst-medium": _get_vits_piper,
661
+ }
662
+
663
+ swahili_models = {
664
+ "csukuangfj/vits-piper-sw_CD-lanfrica-medium": _get_vits_piper,
665
+ }
666
+
667
+ turkish_models = {
668
+ "csukuangfj/vits-piper-tr_TR-dfki-medium": _get_vits_piper,
669
+ "csukuangfj/vits-piper-tr_TR-fahrettin-medium": _get_vits_piper,
670
+ }
671
+
672
+ vietnamese_models = {
673
+ "csukuangfj/vits-piper-vi_VN-25hours_single-low": _get_vits_piper,
674
+ "csukuangfj/vits-piper-vi_VN-vais1000-medium": _get_vits_piper,
675
+ "csukuangfj/vits-piper-vi_VN-vivos-x_low": _get_vits_piper,
676
+ "csukuangfj/vits-mimic3-vi_VN-vais1000_low": _get_vits_piper,
677
+ }
678
+
679
+ bulgarian_models = {
680
+ "csukuangfj/vits-coqui-bg-cv": _get_vits_piper,
681
+ }
682
+
683
+ estonian_models = {
684
+ "csukuangfj/vits-coqui-et-cv": _get_vits_piper,
685
+ }
686
+
687
+ irish_models = {
688
+ "csukuangfj/vits-coqui-ga-cv": _get_vits_piper,
689
+ }
690
+
691
+ croatian_models = {
692
+ "csukuangfj/vits-coqui-hr-cv": _get_vits_piper,
693
+ }
694
+
695
+ lithuanian_models = {
696
+ "csukuangfj/vits-coqui-lt-cv": _get_vits_piper,
697
+ }
698
+
699
+ latvian_models = {
700
+ "csukuangfj/vits-coqui-lv-cv": _get_vits_piper,
701
+ }
702
+
703
+ maltese_models = {
704
+ "csukuangfj/vits-coqui-mt-cv": _get_vits_piper,
705
+ }
706
+
707
+ slovenian_models = {
708
+ "csukuangfj/vits-piper-sl_SI-artur-medium": _get_vits_piper,
709
+ "csukuangfj/vits-coqui-sl-cv": _get_vits_piper,
710
+ }
711
+
712
+ # Bangla
713
+ bengali_models = {
714
+ "csukuangfj/vits-coqui-bn-custom_female": _get_vits_piper,
715
+ "csukuangfj/vits-mimic3-bn-multi_low": _get_vits_piper,
716
+ }
717
+
718
+ min_nan_models = {
719
+ "csukuangfj/vits-mms-nan": _get_vits_mms,
720
+ }
721
+
722
+ thai_models = {
723
+ "csukuangfj/vits-mms-tha": _get_vits_mms,
724
+ }
725
+
726
+ persian_models = {
727
+ "csukuangfj/vits-piper-fa_IR-amir-medium": _get_vits_piper,
728
+ "csukuangfj/vits-piper-fa_IR-gyro-medium": _get_vits_piper,
729
+ "csukuangfj/vits-mimic3-fa-haaniye_low": _get_vits_piper,
730
+ }
731
+
732
+ korean_models = {
733
+ "csukuangfj/vits-mimic3-ko_KO-kss_low": _get_vits_piper,
734
+ }
735
+
736
+
737
+ afrikaans_models = {
738
+ "csukuangfj/vits-mimic3-af_ZA-google-nwu_low": _get_vits_piper,
739
+ }
740
+
741
+ gujarati_models = {
742
+ "csukuangfj/vits-mimic3-gu_IN-cmu-indic_low": _get_vits_piper,
743
+ }
744
+
745
+ tswana_models = {
746
+ "csukuangfj/vits-mimic3-tn_ZA-google-nwu_low": _get_vits_piper,
747
+ }
748
+
749
+
750
+ language_to_models = {
751
+ "English": list(english_models.keys()),
752
+ "Chinese (Mandarin, 普通话)": list(chinese_models.keys()),
753
+ "Cantonese (粤语)": list(cantonese_models.keys()),
754
+ "Min-nan (闽南话)": list(min_nan_models.keys()),
755
+ "Arabic": list(arabic_models.keys()),
756
+ "Afrikaans": list(afrikaans_models.keys()),
757
+ "Bengali": list(bengali_models.keys()),
758
+ "Bulgarian": list(bulgarian_models.keys()),
759
+ "Catalan": list(catalan_models.keys()),
760
+ "Croatian": list(croatian_models.keys()),
761
+ "Czech": list(czech_models.keys()),
762
+ "Danish": list(danish_models.keys()),
763
+ "Dutch": list(dutch_models.keys()),
764
+ "Estonian": list(estonian_models.keys()),
765
+ "Finnish": list(finnish_models.keys()),
766
+ "French": list(french_models.keys()),
767
+ "Georgian": list(georgian_models.keys()),
768
+ "German": list(german_models.keys()),
769
+ "Greek": list(greek_models.keys()),
770
+ "Gujarati": list(gujarati_models.keys()),
771
+ "Hungarian": list(hungarian_models.keys()),
772
+ "Icelandic": list(icelandic_models.keys()),
773
+ "Irish": list(irish_models.keys()),
774
+ "Italian": list(italian_models.keys()),
775
+ "Kazakh": list(kazakh_models.keys()),
776
+ "Korean": list(korean_models.keys()),
777
+ "Latvian": list(latvian_models.keys()),
778
+ "Lithuanian": list(lithuanian_models.keys()),
779
+ "Luxembourgish": list(luxembourgish_models.keys()),
780
+ "Maltese": list(maltese_models.keys()),
781
+ "Nepali": list(nepali_models.keys()),
782
+ "Norwegian": list(norwegian_models.keys()),
783
+ "Persian": list(persian_models.keys()),
784
+ "Polish": list(polish_models.keys()),
785
+ "Portuguese": list(portuguese_models.keys()),
786
+ "Romanian": list(romanian_models.keys()),
787
+ "Russian": list(russian_models.keys()),
788
+ "Serbian": list(serbian_models.keys()),
789
+ "Slovak": list(slovak_models.keys()),
790
+ "Slovenian": list(slovenian_models.keys()),
791
+ "Spanish": list(spanish_models.keys()),
792
+ "Swahili": list(swahili_models.keys()),
793
+ "Swedish": list(swedish_models.keys()),
794
+ "Thai": list(thai_models.keys()),
795
+ "Tswana": list(tswana_models.keys()),
796
+ "Turkish": list(turkish_models.keys()),
797
+ "Ukrainian": list(ukrainian_models.keys()),
798
+ "Vietnamese": list(vietnamese_models.keys()),
799
+ }
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ https://huggingface.co/csukuangfj/sherpa-onnx-wheels/resolve/main/sherpa_onnx-1.9.23-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
2
+ #sherpa-onnx
3
+
4
+ soundfile