File size: 3,404 Bytes
475b5db 96fa9a6 475b5db 96fa9a6 475b5db 96fa9a6 475b5db 96fa9a6 475b5db 96fa9a6 475b5db 96fa9a6 475b5db 96fa9a6 475b5db 96fa9a6 475b5db 96fa9a6 475b5db 96fa9a6 475b5db 96fa9a6 475b5db 96fa9a6 475b5db |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
# CosyVoice
## Install
**Clone and install**
- Clone the repo
``` sh
git clone https://github.com/modelscope/cosyvoice.git
```
- Install Conda: please see https://docs.conda.io/en/latest/miniconda.html
- Create Conda env:
``` sh
conda create -n cosyvoice python=3.8
conda activate cosyvoice
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
```
**Model download**
We strongly recommand that you download our pretrained multi_lingual and mutli_emotion model.
If you are expert in this field, and you are only interested in training your own CosyVoice model from scratch, you can skip this step.
``` sh
mkdir -p pretrained_models
git clone https://www.modelscope.cn/CosyVoice/multi_lingual_cosytts.git pretrained_models/multi_lingual_cosytts
git clone https://www.modelscope.cn/CosyVoice/multi_emotion_cosytts.git pretrained_models/multi_emotion_cosytts
```
**Basic Usage**
For zero_shot and sft inference, please use models in `pretrained_models/multi_lingual_cosytts`
```
from cosyvoice.cli.cosyvoice import CosyVoice
from cosyvoice.utils.file_utils import load_wav
import torchaudio
cosyvoice = CosyVoice('pretrained_models/multi_lingual_cosytts')
# sft usage
print(cosyvoice.list_avaliable_spks())
output = cosyvoice.inference_sft('hello, my name is Jack. What is your name?', 'aishuo')
torchaudio.save('sft.wav', output['tts_speech'], 22050)
# zero_shot usage
prompt_speech_22050 = load_wav('1089_134686_000002_000000.wav', 22050)
output = cosyvoice.inference_zero_shot('hello, my name is Jack. What is your name?', 'It would be a gloomy secret night.', prompt_speech_22050)
torchaudio.save('zero_shot.wav', output['tts_speech'], 22050)
```
For instruct inference, please use models in `pretrained_models/multi_emotion_cosytts`
```
from cosyvoice.cli.cosyvoice import CosyVoice
from cosyvoice.utils.file_utils import load_wav
import torchaudio
cosyvoice = CosyVoice('pretrained_models/multi_emotion_cosytts')
# instruct usage
prompt_speech_22050 = load_wav('1089_134686_000002_000000.wav', 22050)
output = cosyvoice.inference_instruct('hello, my name is Jack. What is your name?', 'It would be a gloomy secret night.', prompt_speech_22050, 'A serene woman articulates thoughtfully in a high pitch and slow tempo, exuding a peaceful and joyful aura.')
torchaudio.save('instruct.wav', output['tts_speech'], 22050)
```
**Advanced Usage**
For advanced user, we have provided train and inference scripts in `examples/libritts/cosyvoice/run.sh`.
You can get familiar with CosyVoice following this recipie.
**Start web demo**
You can use our web demo page to get familiar with CosyVoice quickly.
We only support zero_shot/sft inference in web demo.
Please see the demo website for details.
```
python3 webui.py --port 50000 --model_dir pretrained_models/multi_lingual_cosytts
```
**Build for deployment**
Optionally, if you want to use grpc for service deployment,
you can run following steps. Otherwise, you can just ignore this step.
``` sh
cd runtime/python
docker build -t cosyvoice:v1.0 .
# change multi_lingual_cosytts to multi_emotion_cosytts if you want to use instruct inference
docker run -d --runtime=nvidia -v `pwd`/../../pretrained_models/multi_lingual_cosytts:/opt/cosyvoice/cosyvoice/runtime/pretrained_models -p 50000:50000 cosyvoice:v1.0
python3 client.py --port 50000 --mode <sft|zero_shot|instruct>
``` |