mrfakename commited on
Commit
d5d91ad
·
verified ·
1 Parent(s): 709c47b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +214 -0
README.md ADDED
@@ -0,0 +1,214 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CosyVoice2 0.5B Unofficial Mirror
2
+
3
+ Unofficial mirror for the CosyVoice2 0.5B model hosted on ModelScope.
4
+
5
+ Original model: https://www.modelscope.cn/models/iic/CosyVoice2-0.5B
6
+
7
+ **Original README:**
8
+
9
+ ---
10
+
11
+ # CosyVoice
12
+
13
+ ## 👉🏻 [CosyVoice2 Demos](https://funaudiollm.github.io/cosyvoice2/) 👈🏻
14
+ [[CosyVoice2 Paper](https://fun-audio-llm.github.io/pdf/CosyVoice_v1.pdf)][[CosyVoice2 Studio](https://www.modelscope.cn/studios/iic/CosyVoice2-0.5B)]
15
+
16
+ ## 👉🏻 [CosyVoice Demos](https://fun-audio-llm.github.io/) 👈🏻
17
+ [[CosyVoice Paper](https://fun-audio-llm.github.io/pdf/CosyVoice_v1.pdf)][[CosyVoice Studio](https://www.modelscope.cn/studios/iic/CosyVoice-300M)][[CosyVoice Code](https://github.com/FunAudioLLM/CosyVoice)]
18
+
19
+ For `SenseVoice`, visit [SenseVoice repo](https://github.com/FunAudioLLM/SenseVoice) and [SenseVoice space](https://www.modelscope.cn/studios/iic/SenseVoice).
20
+
21
+ ## Roadmap
22
+
23
+ - [x] 2024/12
24
+
25
+ - [x] CosyVoice2-0.5B model release
26
+ - [x] CosyVoice2-0.5B streaming inference with no quality degradation
27
+
28
+ - [x] 2024/07
29
+
30
+ - [x] Flow matching training support
31
+ - [x] WeTextProcessing support when ttsfrd is not avaliable
32
+ - [x] Fastapi server and client
33
+
34
+ - [x] 2024/08
35
+
36
+ - [x] Repetition Aware Sampling(RAS) inference for llm stability
37
+ - [x] Streaming inference mode support, including kv cache and sdpa for rtf optimization
38
+
39
+ - [x] 2024/09
40
+
41
+ - [x] 25hz cosyvoice base model
42
+ - [x] 25hz cosyvoice voice conversion model
43
+
44
+ - [ ] TBD
45
+
46
+ - [ ] CosyVoice2-0.5B bistream inference support
47
+ - [ ] CosyVoice2-0.5B training and finetune recipie
48
+ - [ ] CosyVoice-500M trained with more multi-lingual data
49
+ - [ ] More...
50
+
51
+ ## Install
52
+
53
+ **Clone and install**
54
+
55
+ - Clone the repo
56
+ ``` sh
57
+ git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git
58
+ # If you failed to clone submodule due to network failures, please run following command until success
59
+ cd CosyVoice
60
+ git submodule update --init --recursive
61
+ ```
62
+
63
+ - Install Conda: please see https://docs.conda.io/en/latest/miniconda.html
64
+ - Create Conda env:
65
+
66
+ ``` sh
67
+ conda create -n cosyvoice python=3.10
68
+ conda activate cosyvoice
69
+ # pynini is required by WeTextProcessing, use conda to install it as it can be executed on all platform.
70
+ conda install -y -c conda-forge pynini==2.1.5
71
+ pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
72
+
73
+ # If you encounter sox compatibility issues
74
+ # ubuntu
75
+ sudo apt-get install sox libsox-dev
76
+ # centos
77
+ sudo yum install sox sox-devel
78
+ ```
79
+
80
+ **Model download**
81
+
82
+ We strongly recommend that you download our pretrained `CosyVoice-300M` `CosyVoice-300M-SFT` `CosyVoice-300M-Instruct` model and `CosyVoice-ttsfrd` resource.
83
+
84
+ If you are expert in this field, and you are only interested in training your own CosyVoice model from scratch, you can skip this step.
85
+
86
+ ``` python
87
+ # SDK模型下载
88
+ from modelscope import snapshot_download
89
+ snapshot_download('iic/CosyVoice2-0.5B', local_dir='pretrained_models/CosyVoice2-0.5B')
90
+ snapshot_download('iic/CosyVoice-300M', local_dir='pretrained_models/CosyVoice-300M')
91
+ snapshot_download('iic/CosyVoice-300M-25Hz', local_dir='pretrained_models/CosyVoice-300M-25Hz')
92
+ snapshot_download('iic/CosyVoice-300M-SFT', local_dir='pretrained_models/CosyVoice-300M-SFT')
93
+ snapshot_download('iic/CosyVoice-300M-Instruct', local_dir='pretrained_models/CosyVoice-300M-Instruct')
94
+ snapshot_download('iic/CosyVoice-ttsfrd', local_dir='pretrained_models/CosyVoice-ttsfrd')
95
+ ```
96
+
97
+ ``` sh
98
+ # git模型下载,请确保已安装git lfs
99
+ mkdir -p pretrained_models
100
+ git clone https://www.modelscope.cn/iic/CosyVoice2-0.5B.git pretrained_models/CosyVoice2-0.5B
101
+ git clone https://www.modelscope.cn/iic/CosyVoice-300M.git pretrained_models/CosyVoice-300M
102
+ git clone https://www.modelscope.cn/iic/CosyVoice-300M-25Hz.git pretrained_models/CosyVoice-300M-25Hz
103
+ git clone https://www.modelscope.cn/iic/CosyVoice-300M-SFT.git pretrained_models/CosyVoice-300M-SFT
104
+ git clone https://www.modelscope.cn/iic/CosyVoice-300M-Instruct.git pretrained_models/CosyVoice-300M-Instruct
105
+ git clone https://www.modelscope.cn/iic/CosyVoice-ttsfrd.git pretrained_models/CosyVoice-ttsfrd
106
+ ```
107
+
108
+ Optionaly, you can unzip `ttsfrd` resouce and install `ttsfrd` package for better text normalization performance.
109
+
110
+ Notice that this step is not necessary. If you do not install `ttsfrd` package, we will use WeTextProcessing by default.
111
+
112
+ ``` sh
113
+ cd pretrained_models/CosyVoice-ttsfrd/
114
+ unzip resource.zip -d .
115
+ pip install ttsfrd-0.3.6-cp38-cp38-linux_x86_64.whl
116
+ ```
117
+
118
+ **Basic Usage**
119
+
120
+ For zero_shot/cross_lingual inference, please use `CosyVoice2-0.5B` or `CosyVoice-300M` model.
121
+ For sft inference, please use `CosyVoice-300M-SFT` model.
122
+ For instruct inference, please use `CosyVoice-300M-Instruct` model.
123
+ We strongly recommend using `CosyVoice2-0.5B` model for better streaming performance.
124
+
125
+ First, add `third_party/Matcha-TTS` to your `PYTHONPATH`.
126
+
127
+ ``` sh
128
+ export PYTHONPATH=third_party/Matcha-TTS
129
+ ```
130
+
131
+ ``` python
132
+ from cosyvoice.cli.cosyvoice import CosyVoice, CosyVoice2
133
+ from cosyvoice.utils.file_utils import load_wav
134
+ import torchaudio
135
+ cosyvoice = CosyVoice2('pretrained_models/CosyVoice2-0.5B', load_jit=True, load_onnx=False, load_trt=False)
136
+
137
+ # zero_shot usage
138
+ prompt_speech_16k = load_wav('zero_shot_prompt.wav', 16000)
139
+ for i, j in enumerate(cosyvoice.inference_zero_shot('收到好友从远方寄来的生日礼物,那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐,笑容如花儿般绽放。', '希望你以后能够做的比我还好呦。', prompt_speech_16k, stream=False)):
140
+ torchaudio.save('zero_shot_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)
141
+
142
+ # fine grained control, for supported control, check cosyvoice/tokenizer/tokenizer.py#L248
143
+ prompt_speech_16k = load_wav('zero_shot_prompt.wav', 16000)
144
+ for i, j in enumerate(cosyvoice.inference_cross_lingual('在他讲述那个荒诞故事的过程中,他突然[laughter]停下来,因为他自己也被逗笑了[laughter]。', prompt_speech_16k, stream=False)):
145
+ torchaudio.save('fine_grained_control_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)
146
+
147
+ # instruct usage
148
+ for i, j in enumerate(cosyvoice.inference_instruct2('收到好友从远方寄来的生日礼物,那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐,笑容如花儿般绽放。', '用四川话说这句话', prompt_speech_16k, stream=False)):
149
+ torchaudio.save('instruct_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)
150
+ ```
151
+
152
+ **Start web demo**
153
+
154
+ You can use our web demo page to get familiar with CosyVoice quickly.
155
+ We support sft/zero_shot/cross_lingual/instruct inference in web demo.
156
+
157
+ Please see the demo website for details.
158
+
159
+ ``` python
160
+ # change iic/CosyVoice-300M-SFT for sft inference, or iic/CosyVoice-300M-Instruct for instruct inference
161
+ python3 webui.py --port 50000 --model_dir pretrained_models/CosyVoice-300M
162
+ ```
163
+
164
+ **Advanced Usage**
165
+
166
+ For advanced user, we have provided train and inference scripts in `examples/libritts/cosyvoice/run.sh`.
167
+ You can get familiar with CosyVoice following this recipie.
168
+
169
+ **Build for deployment**
170
+
171
+ Optionally, if you want to use grpc for service deployment,
172
+ you can run following steps. Otherwise, you can just ignore this step.
173
+
174
+ ``` sh
175
+ cd runtime/python
176
+ docker build -t cosyvoice:v1.0 .
177
+ # change iic/CosyVoice-300M to iic/CosyVoice-300M-Instruct if you want to use instruct inference
178
+ # for grpc usage
179
+ docker run -d --runtime=nvidia -p 50000:50000 cosyvoice:v1.0 /bin/bash -c "cd /opt/CosyVoice/CosyVoice/runtime/python/grpc && python3 server.py --port 50000 --max_conc 4 --model_dir iic/CosyVoice-300M && sleep infinity"
180
+ cd grpc && python3 client.py --port 50000 --mode <sft|zero_shot|cross_lingual|instruct>
181
+ # for fastapi usage
182
+ docker run -d --runtime=nvidia -p 50000:50000 cosyvoice:v1.0 /bin/bash -c "cd /opt/CosyVoice/CosyVoice/runtime/python/fastapi && python3 server.py --port 50000 --model_dir iic/CosyVoice-300M && sleep infinity"
183
+ cd fastapi && python3 client.py --port 50000 --mode <sft|zero_shot|cross_lingual|instruct>
184
+ ```
185
+
186
+ ## Discussion & Communication
187
+
188
+ You can directly discuss on [Github Issues](https://github.com/FunAudioLLM/CosyVoice/issues).
189
+
190
+ You can also scan the QR code to join our official Dingding chat group.
191
+
192
+ <img src="./asset/dingding.png" width="250px">
193
+
194
+ ## Acknowledge
195
+
196
+ 1. We borrowed a lot of code from [FunASR](https://github.com/modelscope/FunASR).
197
+ 2. We borrowed a lot of code from [FunCodec](https://github.com/modelscope/FunCodec).
198
+ 3. We borrowed a lot of code from [Matcha-TTS](https://github.com/shivammehta25/Matcha-TTS).
199
+ 4. We borrowed a lot of code from [AcademiCodec](https://github.com/yangdongchao/AcademiCodec).
200
+ 5. We borrowed a lot of code from [WeNet](https://github.com/wenet-e2e/wenet).
201
+
202
+ ## Citations
203
+
204
+ ``` bibtex
205
+ @article{du2024cosyvoice,
206
+ title={Cosyvoice: A scalable multilingual zero-shot text-to-speech synthesizer based on supervised semantic tokens},
207
+ author={Du, Zhihao and Chen, Qian and Zhang, Shiliang and Hu, Kai and Lu, Heng and Yang, Yexin and Hu, Hangrui and Zheng, Siqi and Gu, Yue and Ma, Ziyang and others},
208
+ journal={arXiv preprint arXiv:2407.05407},
209
+ year={2024}
210
+ }
211
+ ```
212
+
213
+ ## Disclaimer
214
+ The content provided above is for academic purposes only and is intended to demonstrate technical capabilities. Some examples are sourced from the internet. If any content infringes on your rights, please contact us to request its removal.