Delete openvoice
Browse files- openvoice/.gitignore +0 -13
- openvoice/LICENSE +0 -7
- openvoice/README.md +0 -70
- openvoice/demo_part1.ipynb +0 -236
- openvoice/demo_part2.ipynb +0 -195
- openvoice/demo_part3.ipynb +0 -145
- openvoice/docs/QA.md +0 -39
- openvoice/docs/USAGE.md +0 -83
- openvoice/requirements.txt +0 -16
- openvoice/resources/demo_speaker0.mp3 +0 -3
- openvoice/resources/demo_speaker1.mp3 +0 -3
- openvoice/resources/demo_speaker2.mp3 +0 -3
- openvoice/resources/example_reference.mp3 +0 -3
- openvoice/resources/framework-ipa.png +0 -0
- openvoice/resources/huggingface.png +0 -0
- openvoice/resources/lepton-hd.png +0 -0
- openvoice/resources/myshell-hd.png +0 -0
- openvoice/resources/openvoicelogo.jpg +0 -3
- openvoice/resources/tts-guide.png +0 -3
- openvoice/resources/voice-clone-guide.png +0 -3
- openvoice/setup.py +0 -45
openvoice/.gitignore
DELETED
@@ -1,13 +0,0 @@
|
|
1 |
-
__pycache__/
|
2 |
-
.ipynb_checkpoints/
|
3 |
-
processed
|
4 |
-
outputs
|
5 |
-
outputs_v2
|
6 |
-
checkpoints
|
7 |
-
checkpoints_v2
|
8 |
-
trash
|
9 |
-
examples*
|
10 |
-
.env
|
11 |
-
build
|
12 |
-
*.egg-info/
|
13 |
-
*.zip
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
openvoice/LICENSE
DELETED
@@ -1,7 +0,0 @@
|
|
1 |
-
Copyright 2024 MyShell.ai
|
2 |
-
|
3 |
-
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
|
4 |
-
|
5 |
-
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
|
6 |
-
|
7 |
-
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
openvoice/README.md
DELETED
@@ -1,70 +0,0 @@
|
|
1 |
-
<div align="center">
|
2 |
-
<div> </div>
|
3 |
-
<img src="resources/openvoicelogo.jpg" width="400"/>
|
4 |
-
|
5 |
-
[Paper](https://arxiv.org/abs/2312.01479) |
|
6 |
-
[Website](https://research.myshell.ai/open-voice) <br> <br>
|
7 |
-
<a href="https://trendshift.io/repositories/6161" target="_blank"><img src="https://trendshift.io/api/badge/repositories/6161" alt="myshell-ai%2FOpenVoice | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
|
8 |
-
</div>
|
9 |
-
|
10 |
-
## Introduction
|
11 |
-
|
12 |
-
### OpenVoice V1
|
13 |
-
|
14 |
-
As we detailed in our [paper](https://arxiv.org/abs/2312.01479) and [website](https://research.myshell.ai/open-voice), the advantages of OpenVoice are three-fold:
|
15 |
-
|
16 |
-
**1. Accurate Tone Color Cloning.**
|
17 |
-
OpenVoice can accurately clone the reference tone color and generate speech in multiple languages and accents.
|
18 |
-
|
19 |
-
**2. Flexible Voice Style Control.**
|
20 |
-
OpenVoice enables granular control over voice styles, such as emotion and accent, as well as other style parameters including rhythm, pauses, and intonation.
|
21 |
-
|
22 |
-
**3. Zero-shot Cross-lingual Voice Cloning.**
|
23 |
-
Neither of the language of the generated speech nor the language of the reference speech needs to be presented in the massive-speaker multi-lingual training dataset.
|
24 |
-
|
25 |
-
### OpenVoice V2
|
26 |
-
|
27 |
-
In April 2024, we released OpenVoice V2, which includes all features in V1 and has:
|
28 |
-
|
29 |
-
**1. Better Audio Quality.**
|
30 |
-
OpenVoice V2 adopts a different training strategy that delivers better audio quality.
|
31 |
-
|
32 |
-
**2. Native Multi-lingual Support.**
|
33 |
-
English, Spanish, French, Chinese, Japanese and Korean are natively supported in OpenVoice V2.
|
34 |
-
|
35 |
-
**3. Free Commercial Use.**
|
36 |
-
Starting from April 2024, both V2 and V1 are released under MIT License. Free for commercial use.
|
37 |
-
|
38 |
-
[Video](https://github.com/myshell-ai/OpenVoice/assets/40556743/3cba936f-82bf-476c-9e52-09f0f417bb2f)
|
39 |
-
|
40 |
-
OpenVoice has been powering the instant voice cloning capability of [myshell.ai](https://app.myshell.ai/explore) since May 2023. Until Nov 2023, the voice cloning model has been used tens of millions of times by users worldwide, and witnessed the explosive user growth on the platform.
|
41 |
-
|
42 |
-
## Main Contributors
|
43 |
-
|
44 |
-
- [Zengyi Qin](https://www.qinzy.tech) at MIT
|
45 |
-
- [Wenliang Zhao](https://wl-zhao.github.io) at Tsinghua University
|
46 |
-
- [Xumin Yu](https://yuxumin.github.io) at Tsinghua University
|
47 |
-
- [Ethan Sun](https://twitter.com/ethan_myshell) at MyShell
|
48 |
-
|
49 |
-
## How to Use
|
50 |
-
Please see [usage](docs/USAGE.md) for detailed instructions.
|
51 |
-
|
52 |
-
## Common Issues
|
53 |
-
|
54 |
-
Please see [QA](docs/QA.md) for common questions and answers. We will regularly update the question and answer list.
|
55 |
-
|
56 |
-
## Citation
|
57 |
-
```
|
58 |
-
@article{qin2023openvoice,
|
59 |
-
title={OpenVoice: Versatile Instant Voice Cloning},
|
60 |
-
author={Qin, Zengyi and Zhao, Wenliang and Yu, Xumin and Sun, Xin},
|
61 |
-
journal={arXiv preprint arXiv:2312.01479},
|
62 |
-
year={2023}
|
63 |
-
}
|
64 |
-
```
|
65 |
-
|
66 |
-
## License
|
67 |
-
OpenVoice V1 and V2 are MIT Licensed. Free for both commercial and research use.
|
68 |
-
|
69 |
-
## Acknowledgements
|
70 |
-
This implementation is based on several excellent projects, [TTS](https://github.com/coqui-ai/TTS), [VITS](https://github.com/jaywalnut310/vits), and [VITS2](https://github.com/daniilrobnikov/vits2). Thanks for their awesome work!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
openvoice/demo_part1.ipynb
DELETED
@@ -1,236 +0,0 @@
|
|
1 |
-
{
|
2 |
-
"cells": [
|
3 |
-
{
|
4 |
-
"cell_type": "markdown",
|
5 |
-
"id": "b6ee1ede",
|
6 |
-
"metadata": {},
|
7 |
-
"source": [
|
8 |
-
"## Voice Style Control Demo"
|
9 |
-
]
|
10 |
-
},
|
11 |
-
{
|
12 |
-
"cell_type": "code",
|
13 |
-
"execution_count": null,
|
14 |
-
"id": "b7f043ee",
|
15 |
-
"metadata": {},
|
16 |
-
"outputs": [],
|
17 |
-
"source": [
|
18 |
-
"import os\n",
|
19 |
-
"import torch\n",
|
20 |
-
"from openvoice import se_extractor\n",
|
21 |
-
"from openvoice.api import BaseSpeakerTTS, ToneColorConverter"
|
22 |
-
]
|
23 |
-
},
|
24 |
-
{
|
25 |
-
"cell_type": "markdown",
|
26 |
-
"id": "15116b59",
|
27 |
-
"metadata": {},
|
28 |
-
"source": [
|
29 |
-
"### Initialization"
|
30 |
-
]
|
31 |
-
},
|
32 |
-
{
|
33 |
-
"cell_type": "code",
|
34 |
-
"execution_count": null,
|
35 |
-
"id": "aacad912",
|
36 |
-
"metadata": {},
|
37 |
-
"outputs": [],
|
38 |
-
"source": [
|
39 |
-
"ckpt_base = 'checkpoints/base_speakers/EN'\n",
|
40 |
-
"ckpt_converter = 'checkpoints/converter'\n",
|
41 |
-
"device=\"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n",
|
42 |
-
"output_dir = 'outputs'\n",
|
43 |
-
"\n",
|
44 |
-
"base_speaker_tts = BaseSpeakerTTS(f'{ckpt_base}/config.json', device=device)\n",
|
45 |
-
"base_speaker_tts.load_ckpt(f'{ckpt_base}/checkpoint.pth')\n",
|
46 |
-
"\n",
|
47 |
-
"tone_color_converter = ToneColorConverter(f'{ckpt_converter}/config.json', device=device)\n",
|
48 |
-
"tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')\n",
|
49 |
-
"\n",
|
50 |
-
"os.makedirs(output_dir, exist_ok=True)"
|
51 |
-
]
|
52 |
-
},
|
53 |
-
{
|
54 |
-
"cell_type": "markdown",
|
55 |
-
"id": "7f67740c",
|
56 |
-
"metadata": {},
|
57 |
-
"source": [
|
58 |
-
"### Obtain Tone Color Embedding"
|
59 |
-
]
|
60 |
-
},
|
61 |
-
{
|
62 |
-
"cell_type": "markdown",
|
63 |
-
"id": "f8add279",
|
64 |
-
"metadata": {},
|
65 |
-
"source": [
|
66 |
-
"The `source_se` is the tone color embedding of the base speaker. \n",
|
67 |
-
"It is an average of multiple sentences generated by the base speaker. We directly provide the result here but\n",
|
68 |
-
"the readers feel free to extract `source_se` by themselves."
|
69 |
-
]
|
70 |
-
},
|
71 |
-
{
|
72 |
-
"cell_type": "code",
|
73 |
-
"execution_count": null,
|
74 |
-
"id": "63ff6273",
|
75 |
-
"metadata": {},
|
76 |
-
"outputs": [],
|
77 |
-
"source": [
|
78 |
-
"source_se = torch.load(f'{ckpt_base}/en_default_se.pth').to(device)"
|
79 |
-
]
|
80 |
-
},
|
81 |
-
{
|
82 |
-
"cell_type": "markdown",
|
83 |
-
"id": "4f71fcc3",
|
84 |
-
"metadata": {},
|
85 |
-
"source": [
|
86 |
-
"The `reference_speaker.mp3` below points to the short audio clip of the reference whose voice we want to clone. We provide an example here. If you use your own reference speakers, please **make sure each speaker has a unique filename.** The `se_extractor` will save the `targeted_se` using the filename of the audio and **will not automatically overwrite.**"
|
87 |
-
]
|
88 |
-
},
|
89 |
-
{
|
90 |
-
"cell_type": "code",
|
91 |
-
"execution_count": null,
|
92 |
-
"id": "55105eae",
|
93 |
-
"metadata": {},
|
94 |
-
"outputs": [],
|
95 |
-
"source": [
|
96 |
-
"reference_speaker = 'resources/example_reference.mp3' # This is the voice you want to clone\n",
|
97 |
-
"target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, target_dir='processed', vad=True)"
|
98 |
-
]
|
99 |
-
},
|
100 |
-
{
|
101 |
-
"cell_type": "markdown",
|
102 |
-
"id": "a40284aa",
|
103 |
-
"metadata": {},
|
104 |
-
"source": [
|
105 |
-
"### Inference"
|
106 |
-
]
|
107 |
-
},
|
108 |
-
{
|
109 |
-
"cell_type": "code",
|
110 |
-
"execution_count": null,
|
111 |
-
"id": "73dc1259",
|
112 |
-
"metadata": {},
|
113 |
-
"outputs": [],
|
114 |
-
"source": [
|
115 |
-
"save_path = f'{output_dir}/output_en_default.wav'\n",
|
116 |
-
"\n",
|
117 |
-
"# Run the base speaker tts\n",
|
118 |
-
"text = \"This audio is generated by OpenVoice.\"\n",
|
119 |
-
"src_path = f'{output_dir}/tmp.wav'\n",
|
120 |
-
"base_speaker_tts.tts(text, src_path, speaker='default', language='English', speed=1.0)\n",
|
121 |
-
"\n",
|
122 |
-
"# Run the tone color converter\n",
|
123 |
-
"encode_message = \"@MyShell\"\n",
|
124 |
-
"tone_color_converter.convert(\n",
|
125 |
-
" audio_src_path=src_path, \n",
|
126 |
-
" src_se=source_se, \n",
|
127 |
-
" tgt_se=target_se, \n",
|
128 |
-
" output_path=save_path,\n",
|
129 |
-
" message=encode_message)"
|
130 |
-
]
|
131 |
-
},
|
132 |
-
{
|
133 |
-
"cell_type": "markdown",
|
134 |
-
"id": "6e3ea28a",
|
135 |
-
"metadata": {},
|
136 |
-
"source": [
|
137 |
-
"**Try with different styles and speed.** The style can be controlled by the `speaker` parameter in the `base_speaker_tts.tts` method. Available choices: friendly, cheerful, excited, sad, angry, terrified, shouting, whispering. Note that the tone color embedding need to be updated. The speed can be controlled by the `speed` parameter. Let's try whispering with speed 0.9."
|
138 |
-
]
|
139 |
-
},
|
140 |
-
{
|
141 |
-
"cell_type": "code",
|
142 |
-
"execution_count": null,
|
143 |
-
"id": "fd022d38",
|
144 |
-
"metadata": {},
|
145 |
-
"outputs": [],
|
146 |
-
"source": [
|
147 |
-
"source_se = torch.load(f'{ckpt_base}/en_style_se.pth').to(device)\n",
|
148 |
-
"save_path = f'{output_dir}/output_whispering.wav'\n",
|
149 |
-
"\n",
|
150 |
-
"# Run the base speaker tts\n",
|
151 |
-
"text = \"This audio is generated by OpenVoice.\"\n",
|
152 |
-
"src_path = f'{output_dir}/tmp.wav'\n",
|
153 |
-
"base_speaker_tts.tts(text, src_path, speaker='whispering', language='English', speed=0.9)\n",
|
154 |
-
"\n",
|
155 |
-
"# Run the tone color converter\n",
|
156 |
-
"encode_message = \"@MyShell\"\n",
|
157 |
-
"tone_color_converter.convert(\n",
|
158 |
-
" audio_src_path=src_path, \n",
|
159 |
-
" src_se=source_se, \n",
|
160 |
-
" tgt_se=target_se, \n",
|
161 |
-
" output_path=save_path,\n",
|
162 |
-
" message=encode_message)"
|
163 |
-
]
|
164 |
-
},
|
165 |
-
{
|
166 |
-
"cell_type": "markdown",
|
167 |
-
"id": "5fcfc70b",
|
168 |
-
"metadata": {},
|
169 |
-
"source": [
|
170 |
-
"**Try with different languages.** OpenVoice can achieve multi-lingual voice cloning by simply replace the base speaker. We provide an example with a Chinese base speaker here and we encourage the readers to try `demo_part2.ipynb` for a detailed demo."
|
171 |
-
]
|
172 |
-
},
|
173 |
-
{
|
174 |
-
"cell_type": "code",
|
175 |
-
"execution_count": null,
|
176 |
-
"id": "a71d1387",
|
177 |
-
"metadata": {},
|
178 |
-
"outputs": [],
|
179 |
-
"source": [
|
180 |
-
"\n",
|
181 |
-
"ckpt_base = 'checkpoints/base_speakers/ZH'\n",
|
182 |
-
"base_speaker_tts = BaseSpeakerTTS(f'{ckpt_base}/config.json', device=device)\n",
|
183 |
-
"base_speaker_tts.load_ckpt(f'{ckpt_base}/checkpoint.pth')\n",
|
184 |
-
"\n",
|
185 |
-
"source_se = torch.load(f'{ckpt_base}/zh_default_se.pth').to(device)\n",
|
186 |
-
"save_path = f'{output_dir}/output_chinese.wav'\n",
|
187 |
-
"\n",
|
188 |
-
"# Run the base speaker tts\n",
|
189 |
-
"text = \"今天天气真好,我们一起出去吃饭吧。\"\n",
|
190 |
-
"src_path = f'{output_dir}/tmp.wav'\n",
|
191 |
-
"base_speaker_tts.tts(text, src_path, speaker='default', language='Chinese', speed=1.0)\n",
|
192 |
-
"\n",
|
193 |
-
"# Run the tone color converter\n",
|
194 |
-
"encode_message = \"@MyShell\"\n",
|
195 |
-
"tone_color_converter.convert(\n",
|
196 |
-
" audio_src_path=src_path, \n",
|
197 |
-
" src_se=source_se, \n",
|
198 |
-
" tgt_se=target_se, \n",
|
199 |
-
" output_path=save_path,\n",
|
200 |
-
" message=encode_message)"
|
201 |
-
]
|
202 |
-
},
|
203 |
-
{
|
204 |
-
"cell_type": "markdown",
|
205 |
-
"id": "8e513094",
|
206 |
-
"metadata": {},
|
207 |
-
"source": [
|
208 |
-
"**Tech for good.** For people who will deploy OpenVoice for public usage: We offer you the option to add watermark to avoid potential misuse. Please see the ToneColorConverter class. **MyShell reserves the ability to detect whether an audio is generated by OpenVoice**, no matter whether the watermark is added or not."
|
209 |
-
]
|
210 |
-
}
|
211 |
-
],
|
212 |
-
"metadata": {
|
213 |
-
"interpreter": {
|
214 |
-
"hash": "9d70c38e1c0b038dbdffdaa4f8bfa1f6767c43760905c87a9fbe7800d18c6c35"
|
215 |
-
},
|
216 |
-
"kernelspec": {
|
217 |
-
"display_name": "Python 3 (ipykernel)",
|
218 |
-
"language": "python",
|
219 |
-
"name": "python3"
|
220 |
-
},
|
221 |
-
"language_info": {
|
222 |
-
"codemirror_mode": {
|
223 |
-
"name": "ipython",
|
224 |
-
"version": 3
|
225 |
-
},
|
226 |
-
"file_extension": ".py",
|
227 |
-
"mimetype": "text/x-python",
|
228 |
-
"name": "python",
|
229 |
-
"nbconvert_exporter": "python",
|
230 |
-
"pygments_lexer": "ipython3",
|
231 |
-
"version": "3.9.18"
|
232 |
-
}
|
233 |
-
},
|
234 |
-
"nbformat": 4,
|
235 |
-
"nbformat_minor": 5
|
236 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
openvoice/demo_part2.ipynb
DELETED
@@ -1,195 +0,0 @@
|
|
1 |
-
{
|
2 |
-
"cells": [
|
3 |
-
{
|
4 |
-
"cell_type": "markdown",
|
5 |
-
"id": "b6ee1ede",
|
6 |
-
"metadata": {},
|
7 |
-
"source": [
|
8 |
-
"## Cross-Lingual Voice Clone Demo"
|
9 |
-
]
|
10 |
-
},
|
11 |
-
{
|
12 |
-
"cell_type": "code",
|
13 |
-
"execution_count": null,
|
14 |
-
"id": "b7f043ee",
|
15 |
-
"metadata": {},
|
16 |
-
"outputs": [],
|
17 |
-
"source": [
|
18 |
-
"import os\n",
|
19 |
-
"import torch\n",
|
20 |
-
"from openvoice import se_extractor\n",
|
21 |
-
"from openvoice.api import ToneColorConverter"
|
22 |
-
]
|
23 |
-
},
|
24 |
-
{
|
25 |
-
"cell_type": "markdown",
|
26 |
-
"id": "15116b59",
|
27 |
-
"metadata": {},
|
28 |
-
"source": [
|
29 |
-
"### Initialization"
|
30 |
-
]
|
31 |
-
},
|
32 |
-
{
|
33 |
-
"cell_type": "code",
|
34 |
-
"execution_count": null,
|
35 |
-
"id": "aacad912",
|
36 |
-
"metadata": {},
|
37 |
-
"outputs": [],
|
38 |
-
"source": [
|
39 |
-
"ckpt_converter = 'checkpoints/converter'\n",
|
40 |
-
"device=\"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n",
|
41 |
-
"output_dir = 'outputs'\n",
|
42 |
-
"\n",
|
43 |
-
"tone_color_converter = ToneColorConverter(f'{ckpt_converter}/config.json', device=device)\n",
|
44 |
-
"tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')\n",
|
45 |
-
"\n",
|
46 |
-
"os.makedirs(output_dir, exist_ok=True)"
|
47 |
-
]
|
48 |
-
},
|
49 |
-
{
|
50 |
-
"cell_type": "markdown",
|
51 |
-
"id": "3db80fcf",
|
52 |
-
"metadata": {},
|
53 |
-
"source": [
|
54 |
-
"In this demo, we will use OpenAI TTS as the base speaker to produce multi-lingual speech audio. The users can flexibly change the base speaker according to their own needs. Please create a file named `.env` and place OpenAI key as `OPENAI_API_KEY=xxx`. We have also provided a Chinese base speaker model (see `demo_part1.ipynb`)."
|
55 |
-
]
|
56 |
-
},
|
57 |
-
{
|
58 |
-
"cell_type": "code",
|
59 |
-
"execution_count": null,
|
60 |
-
"id": "3b245ca3",
|
61 |
-
"metadata": {},
|
62 |
-
"outputs": [],
|
63 |
-
"source": [
|
64 |
-
"from openai import OpenAI\n",
|
65 |
-
"from dotenv import load_dotenv\n",
|
66 |
-
"\n",
|
67 |
-
"# Please create a file named .env and place your\n",
|
68 |
-
"# OpenAI key as OPENAI_API_KEY=xxx\n",
|
69 |
-
"load_dotenv() \n",
|
70 |
-
"\n",
|
71 |
-
"client = OpenAI(api_key=os.environ.get(\"OPENAI_API_KEY\"))\n",
|
72 |
-
"\n",
|
73 |
-
"response = client.audio.speech.create(\n",
|
74 |
-
" model=\"tts-1\",\n",
|
75 |
-
" voice=\"nova\",\n",
|
76 |
-
" input=\"This audio will be used to extract the base speaker tone color embedding. \" + \\\n",
|
77 |
-
" \"Typically a very short audio should be sufficient, but increasing the audio \" + \\\n",
|
78 |
-
" \"length will also improve the output audio quality.\"\n",
|
79 |
-
")\n",
|
80 |
-
"\n",
|
81 |
-
"response.stream_to_file(f\"{output_dir}/openai_source_output.mp3\")"
|
82 |
-
]
|
83 |
-
},
|
84 |
-
{
|
85 |
-
"cell_type": "markdown",
|
86 |
-
"id": "7f67740c",
|
87 |
-
"metadata": {},
|
88 |
-
"source": [
|
89 |
-
"### Obtain Tone Color Embedding"
|
90 |
-
]
|
91 |
-
},
|
92 |
-
{
|
93 |
-
"cell_type": "markdown",
|
94 |
-
"id": "f8add279",
|
95 |
-
"metadata": {},
|
96 |
-
"source": [
|
97 |
-
"The `source_se` is the tone color embedding of the base speaker. \n",
|
98 |
-
"It is an average for multiple sentences with multiple emotions\n",
|
99 |
-
"of the base speaker. We directly provide the result here but\n",
|
100 |
-
"the readers feel free to extract `source_se` by themselves."
|
101 |
-
]
|
102 |
-
},
|
103 |
-
{
|
104 |
-
"cell_type": "code",
|
105 |
-
"execution_count": null,
|
106 |
-
"id": "63ff6273",
|
107 |
-
"metadata": {},
|
108 |
-
"outputs": [],
|
109 |
-
"source": [
|
110 |
-
"base_speaker = f\"{output_dir}/openai_source_output.mp3\"\n",
|
111 |
-
"source_se, audio_name = se_extractor.get_se(base_speaker, tone_color_converter, vad=True)\n",
|
112 |
-
"\n",
|
113 |
-
"reference_speaker = 'resources/example_reference.mp3' # This is the voice you want to clone\n",
|
114 |
-
"target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, vad=True)"
|
115 |
-
]
|
116 |
-
},
|
117 |
-
{
|
118 |
-
"cell_type": "markdown",
|
119 |
-
"id": "a40284aa",
|
120 |
-
"metadata": {},
|
121 |
-
"source": [
|
122 |
-
"### Inference"
|
123 |
-
]
|
124 |
-
},
|
125 |
-
{
|
126 |
-
"cell_type": "code",
|
127 |
-
"execution_count": null,
|
128 |
-
"id": "73dc1259",
|
129 |
-
"metadata": {},
|
130 |
-
"outputs": [],
|
131 |
-
"source": [
|
132 |
-
"# Run the base speaker tts\n",
|
133 |
-
"text = [\n",
|
134 |
-
" \"MyShell is a decentralized and comprehensive platform for discovering, creating, and staking AI-native apps.\",\n",
|
135 |
-
" \"MyShell es una plataforma descentralizada y completa para descubrir, crear y apostar por aplicaciones nativas de IA.\",\n",
|
136 |
-
" \"MyShell est une plateforme décentralisée et complète pour découvrir, créer et miser sur des applications natives d'IA.\",\n",
|
137 |
-
" \"MyShell ist eine dezentralisierte und umfassende Plattform zum Entdecken, Erstellen und Staken von KI-nativen Apps.\",\n",
|
138 |
-
" \"MyShell è una piattaforma decentralizzata e completa per scoprire, creare e scommettere su app native di intelligenza artificiale.\",\n",
|
139 |
-
" \"MyShellは、AIネイティブアプリの発見、作成、およびステーキングのための分散型かつ包括的なプラットフォームです。\",\n",
|
140 |
-
" \"MyShell — это децентрализованная и всеобъемлющая платформа для обнаружения, создания и стейкинга AI-ориентированных приложений.\",\n",
|
141 |
-
" \"MyShell هي منصة لامركزية وشاملة لاكتشاف وإنشاء ورهان تطبيقات الذكاء الاصطناعي الأصلية.\",\n",
|
142 |
-
" \"MyShell是一个去中心化且全面的平台,用于发现、创建和投资AI原生应用程序。\",\n",
|
143 |
-
" \"MyShell एक विकेंद्रीकृत और व्यापक मंच है, जो AI-मूल ऐप्स की खोज, सृजन और स्टेकिंग के लिए है।\",\n",
|
144 |
-
" \"MyShell é uma plataforma descentralizada e abrangente para descobrir, criar e apostar em aplicativos nativos de IA.\"\n",
|
145 |
-
"]\n",
|
146 |
-
"src_path = f'{output_dir}/tmp.wav'\n",
|
147 |
-
"\n",
|
148 |
-
"for i, t in enumerate(text):\n",
|
149 |
-
"\n",
|
150 |
-
" response = client.audio.speech.create(\n",
|
151 |
-
" model=\"tts-1\",\n",
|
152 |
-
" voice=\"nova\",\n",
|
153 |
-
" input=t,\n",
|
154 |
-
" )\n",
|
155 |
-
"\n",
|
156 |
-
" response.stream_to_file(src_path)\n",
|
157 |
-
"\n",
|
158 |
-
" save_path = f'{output_dir}/output_crosslingual_{i}.wav'\n",
|
159 |
-
"\n",
|
160 |
-
" # Run the tone color converter\n",
|
161 |
-
" encode_message = \"@MyShell\"\n",
|
162 |
-
" tone_color_converter.convert(\n",
|
163 |
-
" audio_src_path=src_path, \n",
|
164 |
-
" src_se=source_se, \n",
|
165 |
-
" tgt_se=target_se, \n",
|
166 |
-
" output_path=save_path,\n",
|
167 |
-
" message=encode_message)"
|
168 |
-
]
|
169 |
-
}
|
170 |
-
],
|
171 |
-
"metadata": {
|
172 |
-
"interpreter": {
|
173 |
-
"hash": "9d70c38e1c0b038dbdffdaa4f8bfa1f6767c43760905c87a9fbe7800d18c6c35"
|
174 |
-
},
|
175 |
-
"kernelspec": {
|
176 |
-
"display_name": "Python 3 (ipykernel)",
|
177 |
-
"language": "python",
|
178 |
-
"name": "python3"
|
179 |
-
},
|
180 |
-
"language_info": {
|
181 |
-
"codemirror_mode": {
|
182 |
-
"name": "ipython",
|
183 |
-
"version": 3
|
184 |
-
},
|
185 |
-
"file_extension": ".py",
|
186 |
-
"mimetype": "text/x-python",
|
187 |
-
"name": "python",
|
188 |
-
"nbconvert_exporter": "python",
|
189 |
-
"pygments_lexer": "ipython3",
|
190 |
-
"version": "3.9.18"
|
191 |
-
}
|
192 |
-
},
|
193 |
-
"nbformat": 4,
|
194 |
-
"nbformat_minor": 5
|
195 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
openvoice/demo_part3.ipynb
DELETED
@@ -1,145 +0,0 @@
|
|
1 |
-
{
|
2 |
-
"cells": [
|
3 |
-
{
|
4 |
-
"cell_type": "markdown",
|
5 |
-
"metadata": {},
|
6 |
-
"source": [
|
7 |
-
"## Multi-Accent and Multi-Lingual Voice Clone Demo with MeloTTS"
|
8 |
-
]
|
9 |
-
},
|
10 |
-
{
|
11 |
-
"cell_type": "code",
|
12 |
-
"execution_count": null,
|
13 |
-
"metadata": {},
|
14 |
-
"outputs": [],
|
15 |
-
"source": [
|
16 |
-
"import os\n",
|
17 |
-
"import torch\n",
|
18 |
-
"from openvoice import se_extractor\n",
|
19 |
-
"from openvoice.api import ToneColorConverter"
|
20 |
-
]
|
21 |
-
},
|
22 |
-
{
|
23 |
-
"cell_type": "markdown",
|
24 |
-
"metadata": {},
|
25 |
-
"source": [
|
26 |
-
"### Initialization\n",
|
27 |
-
"\n",
|
28 |
-
"In this example, we will use the checkpoints from OpenVoiceV2. OpenVoiceV2 is trained with more aggressive augmentations and thus demonstrate better robustness in some cases."
|
29 |
-
]
|
30 |
-
},
|
31 |
-
{
|
32 |
-
"cell_type": "code",
|
33 |
-
"execution_count": null,
|
34 |
-
"metadata": {},
|
35 |
-
"outputs": [],
|
36 |
-
"source": [
|
37 |
-
"ckpt_converter = 'checkpoints_v2/converter'\n",
|
38 |
-
"device = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n",
|
39 |
-
"output_dir = 'outputs_v2'\n",
|
40 |
-
"\n",
|
41 |
-
"tone_color_converter = ToneColorConverter(f'{ckpt_converter}/config.json', device=device)\n",
|
42 |
-
"tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')\n",
|
43 |
-
"\n",
|
44 |
-
"os.makedirs(output_dir, exist_ok=True)"
|
45 |
-
]
|
46 |
-
},
|
47 |
-
{
|
48 |
-
"cell_type": "markdown",
|
49 |
-
"metadata": {},
|
50 |
-
"source": [
|
51 |
-
"### Obtain Tone Color Embedding\n",
|
52 |
-
"We only extract the tone color embedding for the target speaker. The source tone color embeddings can be directly loaded from `checkpoints_v2/ses` folder."
|
53 |
-
]
|
54 |
-
},
|
55 |
-
{
|
56 |
-
"cell_type": "code",
|
57 |
-
"execution_count": null,
|
58 |
-
"metadata": {},
|
59 |
-
"outputs": [],
|
60 |
-
"source": [
|
61 |
-
"\n",
|
62 |
-
"reference_speaker = 'resources/example_reference.mp3' # This is the voice you want to clone\n",
|
63 |
-
"target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, vad=True)"
|
64 |
-
]
|
65 |
-
},
|
66 |
-
{
|
67 |
-
"cell_type": "markdown",
|
68 |
-
"metadata": {},
|
69 |
-
"source": [
|
70 |
-
"#### Use MeloTTS as Base Speakers\n",
|
71 |
-
"\n",
|
72 |
-
"MeloTTS is a high-quality multi-lingual text-to-speech library by @MyShell.ai, supporting languages including English (American, British, Indian, Australian, Default), Spanish, French, Chinese, Japanese, Korean. In the following example, we will use the models in MeloTTS as the base speakers. "
|
73 |
-
]
|
74 |
-
},
|
75 |
-
{
|
76 |
-
"cell_type": "code",
|
77 |
-
"execution_count": null,
|
78 |
-
"metadata": {},
|
79 |
-
"outputs": [],
|
80 |
-
"source": [
|
81 |
-
"from melo.api import TTS\n",
|
82 |
-
"\n",
|
83 |
-
"texts = {\n",
|
84 |
-
" 'EN_NEWEST': \"Did you ever hear a folk tale about a giant turtle?\", # The newest English base speaker model\n",
|
85 |
-
" 'EN': \"Did you ever hear a folk tale about a giant turtle?\",\n",
|
86 |
-
" 'ES': \"El resplandor del sol acaricia las olas, pintando el cielo con una paleta deslumbrante.\",\n",
|
87 |
-
" 'FR': \"La lueur dorée du soleil caresse les vagues, peignant le ciel d'une palette éblouissante.\",\n",
|
88 |
-
" 'ZH': \"在这次vacation中,我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。\",\n",
|
89 |
-
" 'JP': \"彼は毎朝ジョギングをして体を健康に保っています。\",\n",
|
90 |
-
" 'KR': \"안녕하세요! 오늘은 날씨가 정말 좋네요.\",\n",
|
91 |
-
"}\n",
|
92 |
-
"\n",
|
93 |
-
"\n",
|
94 |
-
"src_path = f'{output_dir}/tmp.wav'\n",
|
95 |
-
"\n",
|
96 |
-
"# Speed is adjustable\n",
|
97 |
-
"speed = 1.0\n",
|
98 |
-
"\n",
|
99 |
-
"for language, text in texts.items():\n",
|
100 |
-
" model = TTS(language=language, device=device)\n",
|
101 |
-
" speaker_ids = model.hps.data.spk2id\n",
|
102 |
-
" \n",
|
103 |
-
" for speaker_key in speaker_ids.keys():\n",
|
104 |
-
" speaker_id = speaker_ids[speaker_key]\n",
|
105 |
-
" speaker_key = speaker_key.lower().replace('_', '-')\n",
|
106 |
-
" \n",
|
107 |
-
" source_se = torch.load(f'checkpoints_v2/base_speakers/ses/{speaker_key}.pth', map_location=device)\n",
|
108 |
-
" if torch.backends.mps.is_available() and device == 'cpu':\n",
|
109 |
-
" torch.backends.mps.is_available = lambda: False\n",
|
110 |
-
" model.tts_to_file(text, speaker_id, src_path, speed=speed)\n",
|
111 |
-
" save_path = f'{output_dir}/output_v2_{speaker_key}.wav'\n",
|
112 |
-
"\n",
|
113 |
-
" # Run the tone color converter\n",
|
114 |
-
" encode_message = \"@MyShell\"\n",
|
115 |
-
" tone_color_converter.convert(\n",
|
116 |
-
" audio_src_path=src_path, \n",
|
117 |
-
" src_se=source_se, \n",
|
118 |
-
" tgt_se=target_se, \n",
|
119 |
-
" output_path=save_path,\n",
|
120 |
-
" message=encode_message)"
|
121 |
-
]
|
122 |
-
}
|
123 |
-
],
|
124 |
-
"metadata": {
|
125 |
-
"kernelspec": {
|
126 |
-
"display_name": "melo",
|
127 |
-
"language": "python",
|
128 |
-
"name": "python3"
|
129 |
-
},
|
130 |
-
"language_info": {
|
131 |
-
"codemirror_mode": {
|
132 |
-
"name": "ipython",
|
133 |
-
"version": 3
|
134 |
-
},
|
135 |
-
"file_extension": ".py",
|
136 |
-
"mimetype": "text/x-python",
|
137 |
-
"name": "python",
|
138 |
-
"nbconvert_exporter": "python",
|
139 |
-
"pygments_lexer": "ipython3",
|
140 |
-
"version": "3.9.18"
|
141 |
-
}
|
142 |
-
},
|
143 |
-
"nbformat": 4,
|
144 |
-
"nbformat_minor": 2
|
145 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
openvoice/docs/QA.md
DELETED
@@ -1,39 +0,0 @@
|
|
1 |
-
# Common Questions and Answers
|
2 |
-
|
3 |
-
## General Comments
|
4 |
-
|
5 |
-
**OpenVoice is a Technology, not a Product**
|
6 |
-
|
7 |
-
Although it works on a majority of voices if used correctly, please do not expect it to work perfectly on every case, as it takes a lot of engineering effort to translate a technology to a stable product. The targeted users of this technology are developers and researchers, not end users. End users expects a perfect product. However, we are confident to say that OpenVoice is the state-of-the-art among the source-available voice cloning technologies.
|
8 |
-
|
9 |
-
The contribution of OpenVoice is a versatile instant voice cloning technical approach, not a ready-to-use perfect voice cloning product. However, we firmly believe that by releasing OpenVoice, we can accelerate the open research community's progress on instant voice cloning, and someday in the future the free voice cloning methods will be as good as commercial ones.
|
10 |
-
|
11 |
-
## Issues with Voice Quality
|
12 |
-
|
13 |
-
**Accent and Emotion of the Generated Voice is not Similar to the Reference Voice**
|
14 |
-
|
15 |
-
First of all, OpenVoice only clones the tone color of the reference speaker. It does NOT clone the accent or emotion. The accent and emotion is controlled by the base speaker TTS model, not cloned by the tone color converter (please refer to our [paper](https://arxiv.org/pdf/2312.01479.pdf) for technical details). If the user wants to change the accent or emotion of the output, they need to have a base speaker model with that accent. OpenVoice provides sufficient flexibility for users to integrate their own base speaker model into the framework by simply replacing the current base speaker we provided.
|
16 |
-
|
17 |
-
**Bad Audio Quality of the Generated Speech**
|
18 |
-
|
19 |
-
Please check the followings:
|
20 |
-
- Is your reference audio is clean enough without any background noise? You can find some high-quality reference speech [here](https://aiartes.com/voiceai)
|
21 |
-
- Is your audio too short?
|
22 |
-
- Does your audio contain speech from more than one person?
|
23 |
-
- Does the reference audio contain long blank sections?
|
24 |
-
- Did you name the reference audio the same name you used before but forgot to delete the `processed` folder?
|
25 |
-
|
26 |
-
## Issues with Languages
|
27 |
-
|
28 |
-
**Support of Other Languages**
|
29 |
-
|
30 |
-
For multi-lingual and cross-lingual usage, please refer to [`demo_part2.ipynb`](https://github.com/myshell-ai/OpenVoice/blob/main/demo_part2.ipynb). OpenVoice supports any language as long as you have a base speaker in that language. The OpenVoice team already did the most difficult part (tone color converter training) for you. Base speaker TTS model is relatively easy to train, and multiple existing open-source repositories support it. If you don't want to train by yourself, simply use the OpenAI TTS model as the base speaker.
|
31 |
-
|
32 |
-
## Issues with Installation
|
33 |
-
**Error Related to Silero**
|
34 |
-
|
35 |
-
When calling `get_vad_segments` from `se_extractor.py`, there should be a message like this:
|
36 |
-
```
|
37 |
-
Downloading: "https://github.com/snakers4/silero-vad/zipball/master" to /home/user/.cache/torch/hub/master.zip
|
38 |
-
```
|
39 |
-
The download would fail if your machine can not access github. Please download the zip from "https://github.com/snakers4/silero-vad/zipball/master" manually and unzip it to `/home/user/.cache/torch/hub/snakers4_silero-vad_master`. You can also see [this issue](https://github.com/myshell-ai/OpenVoice/issues/57) for solutions for other versions of silero.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
openvoice/docs/USAGE.md
DELETED
@@ -1,83 +0,0 @@
|
|
1 |
-
# Usage
|
2 |
-
|
3 |
-
## Table of Content
|
4 |
-
|
5 |
-
- [Quick Use](#quick-use): directly use OpenVoice without installation.
|
6 |
-
- [Linux Install](#linux-install): for researchers and developers only.
|
7 |
-
- [V1](#openvoice-v1)
|
8 |
-
- [V2](#openvoice-v2)
|
9 |
-
- [Install on Other Platforms](#install-on-other-platforms): unofficial installation guide contributed by the community
|
10 |
-
|
11 |
-
## Quick Use
|
12 |
-
|
13 |
-
The input speech audio of OpenVoice can be in **Any Language**. OpenVoice can clone the voice in that speech audio, and use the voice to speak in multiple languages. For quick use, we recommend you to try the already deployed services:
|
14 |
-
|
15 |
-
- [British English](https://app.myshell.ai/widget/vYjqae)
|
16 |
-
- [American English](https://app.myshell.ai/widget/nEFFJf)
|
17 |
-
- [Indian English](https://app.myshell.ai/widget/V3iYze)
|
18 |
-
- [Australian English](https://app.myshell.ai/widget/fM7JVf)
|
19 |
-
- [Spanish](https://app.myshell.ai/widget/NNFFVz)
|
20 |
-
- [French](https://app.myshell.ai/widget/z2uyUz)
|
21 |
-
- [Chinese](https://app.myshell.ai/widget/fU7nUz)
|
22 |
-
- [Japanese](https://app.myshell.ai/widget/IfIB3u)
|
23 |
-
- [Korean](https://app.myshell.ai/widget/q6ZjIn)
|
24 |
-
|
25 |
-
## Minimal Demo
|
26 |
-
|
27 |
-
For users who want to quickly try OpenVoice and do not require high quality or stability, click any of the following links:
|
28 |
-
|
29 |
-
<div align="center">
|
30 |
-
<a href="https://app.myshell.ai/bot/z6Bvua/1702636181"><img src="../resources/myshell-hd.png" height="28"></a>
|
31 |
-
|
32 |
-
<a href="https://huggingface.co/spaces/myshell-ai/OpenVoice"><img src="../resources/huggingface.png" height="32"></a>
|
33 |
-
</div>
|
34 |
-
|
35 |
-
## Linux Install
|
36 |
-
|
37 |
-
This section is only for developers and researchers who are familiar with Linux, Python and PyTorch. Clone this repo, and run
|
38 |
-
|
39 |
-
```
|
40 |
-
conda create -n openvoice python=3.9
|
41 |
-
conda activate openvoice
|
42 |
-
git clone git@github.com:myshell-ai/OpenVoice.git
|
43 |
-
cd OpenVoice
|
44 |
-
pip install -e .
|
45 |
-
```
|
46 |
-
|
47 |
-
No matter if you are using V1 or V2, the above installation is the same.
|
48 |
-
|
49 |
-
### OpenVoice V1
|
50 |
-
|
51 |
-
Download the checkpoint from [here](https://myshell-public-repo-host.s3.amazonaws.com/openvoice/checkpoints_1226.zip) and extract it to the `checkpoints` folder.
|
52 |
-
|
53 |
-
**1. Flexible Voice Style Control.**
|
54 |
-
Please see [`demo_part1.ipynb`](../demo_part1.ipynb) for an example usage of how OpenVoice enables flexible style control over the cloned voice.
|
55 |
-
|
56 |
-
**2. Cross-Lingual Voice Cloning.**
|
57 |
-
Please see [`demo_part2.ipynb`](../demo_part2.ipynb) for an example for languages seen or unseen in the MSML training set.
|
58 |
-
|
59 |
-
**3. Gradio Demo.**. We provide a minimalist local gradio demo here. We strongly suggest the users to look into `demo_part1.ipynb`, `demo_part2.ipynb` and the [QnA](QA.md) if they run into issues with the gradio demo. Launch a local gradio demo with `python -m openvoice_app --share`.
|
60 |
-
|
61 |
-
### OpenVoice V2
|
62 |
-
|
63 |
-
Download the checkpoint from [here](https://myshell-public-repo-host.s3.amazonaws.com/openvoice/checkpoints_v2_0417.zip) and extract it to the `checkpoints_v2` folder.
|
64 |
-
|
65 |
-
Install [MeloTTS](https://github.com/myshell-ai/MeloTTS):
|
66 |
-
```
|
67 |
-
pip install git+https://github.com/myshell-ai/MeloTTS.git
|
68 |
-
python -m unidic download
|
69 |
-
```
|
70 |
-
|
71 |
-
**Demo Usage.** Please see [`demo_part3.ipynb`](../demo_part3.ipynb) for example usage of OpenVoice V2. Now it natively supports English, Spanish, French, Chinese, Japanese and Korean.
|
72 |
-
|
73 |
-
|
74 |
-
## Install on Other Platforms
|
75 |
-
|
76 |
-
This section provides the unofficial installation guides by open-source contributors in the community:
|
77 |
-
|
78 |
-
- Windows
|
79 |
-
- [Guide](https://github.com/Alienpups/OpenVoice/blob/main/docs/USAGE_WINDOWS.md) by [@Alienpups](https://github.com/Alienpups)
|
80 |
-
- You are welcome to contribute if you have a better installation guide. We will list you here.
|
81 |
-
- Docker
|
82 |
-
- [Guide](https://github.com/StevenJSCF/OpenVoice/blob/update-docs/docs/DF_USAGE.md) by [@StevenJSCF](https://github.com/StevenJSCF)
|
83 |
-
- You are welcome to contribute if you have a better installation guide. We will list you here.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
openvoice/requirements.txt
DELETED
@@ -1,16 +0,0 @@
|
|
1 |
-
librosa==0.9.1
|
2 |
-
faster-whisper==0.9.0
|
3 |
-
pydub==0.25.1
|
4 |
-
wavmark==0.0.3
|
5 |
-
numpy==1.22.0
|
6 |
-
eng_to_ipa==0.0.2
|
7 |
-
inflect==7.0.0
|
8 |
-
unidecode==1.3.7
|
9 |
-
whisper-timestamped==1.14.2
|
10 |
-
openai
|
11 |
-
python-dotenv
|
12 |
-
pypinyin==0.50.0
|
13 |
-
cn2an==0.5.22
|
14 |
-
jieba==0.42.1
|
15 |
-
gradio==3.48.0
|
16 |
-
langid==1.1.6
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
openvoice/resources/demo_speaker0.mp3
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:6e8a024342499eee94e81cd4c5b18c541c04263dc1865fc3f9a134fe3b135e00
|
3 |
-
size 308503
|
|
|
|
|
|
|
|
openvoice/resources/demo_speaker1.mp3
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:beae7e33f1e7bc21c34d1401947b6c10352a817816a3151ff1efb82133585e24
|
3 |
-
size 729355
|
|
|
|
|
|
|
|
openvoice/resources/demo_speaker2.mp3
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:8190e911f82c4b06fde63c55a5599b40ba652950eed704cf2cb54d477cf33978
|
3 |
-
size 471925
|
|
|
|
|
|
|
|
openvoice/resources/example_reference.mp3
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:d0f5806f6e034e660c46a0b2fe4c597f0a1670859743c14e27a8823a7d169263
|
3 |
-
size 961326
|
|
|
|
|
|
|
|
openvoice/resources/framework-ipa.png
DELETED
Binary file (73.6 kB)
|
|
openvoice/resources/huggingface.png
DELETED
Binary file (4.13 kB)
|
|
openvoice/resources/lepton-hd.png
DELETED
Binary file (44.4 kB)
|
|
openvoice/resources/myshell-hd.png
DELETED
Binary file (34.1 kB)
|
|
openvoice/resources/openvoicelogo.jpg
DELETED
Git LFS Details
|
openvoice/resources/tts-guide.png
DELETED
Git LFS Details
|
openvoice/resources/voice-clone-guide.png
DELETED
Git LFS Details
|
openvoice/setup.py
DELETED
@@ -1,45 +0,0 @@
|
|
1 |
-
from setuptools import setup, find_packages
|
2 |
-
|
3 |
-
|
4 |
-
setup(name='MyShell-OpenVoice',
|
5 |
-
version='0.0.0',
|
6 |
-
description='Instant voice cloning by MyShell.',
|
7 |
-
long_description=open('README.md').read().strip(),
|
8 |
-
long_description_content_type='text/markdown',
|
9 |
-
keywords=[
|
10 |
-
'text-to-speech',
|
11 |
-
'tts',
|
12 |
-
'voice-clone',
|
13 |
-
'zero-shot-tts'
|
14 |
-
],
|
15 |
-
url='https://github.com/myshell-ai/OpenVoice',
|
16 |
-
project_urls={
|
17 |
-
'Documentation': 'https://github.com/myshell-ai/OpenVoice/blob/main/docs/USAGE.md',
|
18 |
-
'Changes': 'https://github.com/myshell-ai/OpenVoice/releases',
|
19 |
-
'Code': 'https://github.com/myshell-ai/OpenVoice',
|
20 |
-
'Issue tracker': 'https://github.com/myshell-ai/OpenVoice/issues',
|
21 |
-
},
|
22 |
-
author='MyShell',
|
23 |
-
author_email='ethan@myshell.ai',
|
24 |
-
license='MIT License',
|
25 |
-
packages=find_packages(),
|
26 |
-
|
27 |
-
python_requires='>=3.9',
|
28 |
-
install_requires=[
|
29 |
-
'librosa==0.9.1',
|
30 |
-
'faster-whisper==0.9.0',
|
31 |
-
'pydub==0.25.1',
|
32 |
-
'wavmark==0.0.3',
|
33 |
-
'numpy==1.22.0',
|
34 |
-
'eng_to_ipa==0.0.2',
|
35 |
-
'inflect==7.0.0',
|
36 |
-
'unidecode==1.3.7',
|
37 |
-
'whisper-timestamped==1.14.2',
|
38 |
-
'pypinyin==0.50.0',
|
39 |
-
'cn2an==0.5.22',
|
40 |
-
'jieba==0.42.1',
|
41 |
-
'gradio==3.48.0',
|
42 |
-
'langid==1.1.6'
|
43 |
-
],
|
44 |
-
zip_safe=False
|
45 |
-
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|