Spaces:
Runtime error
A newer version of the Gradio SDK is available:
5.12.0
DiffSinger (SVS version)
0. Data Acquirement
- See in apply_form.
- Dataset preview.
1. Preparation
Data Preparation
a) Download and extract PopCS, then create a link to the dataset folder: ln -s /xxx/popcs/ data/processed/popcs
b) Run the following scripts to pack the dataset for training/inference.
export PYTHONPATH=.
CUDA_VISIBLE_DEVICES=0 python data_gen/tts/bin/binarize.py --config usr/configs/popcs_ds_beta6.yaml
# `data/binary/popcs-pmf0` will be generated.
Vocoder Preparation
We provide the pre-trained model of HifiGAN-Singing which is specially designed for SVS with NSF mechanism.
Please unzip this file into checkpoints
before training your acoustic model.
(Update: You can also move a ckpt with more training steps into this vocoder directory)
This singing vocoder is trained on ~70 hours singing data, which can be viewed as a universal vocoder.
2. Training Example
First, you need a pre-trained FFT-Singer checkpoint. You can use the pre-trained model, or train FFT-Singer from scratch, run:
# First, train fft-singer;
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config usr/configs/popcs_fs2.yaml --exp_name popcs_fs2_pmf0_1230 --reset
# Then, infer fft-singer;
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config usr/configs/popcs_fs2.yaml --exp_name popcs_fs2_pmf0_1230 --reset --infer
Then, to train DiffSinger, run:
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config usr/configs/popcs_ds_beta6_offline.yaml --exp_name popcs_ds_beta6_offline_pmf0_1230 --reset
Remember to adjust the "fs2_ckpt" parameter in usr/configs/popcs_ds_beta6_offline.yaml
to fit your path.
3. Inference Example
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config usr/configs/popcs_ds_beta6_offline.yaml --exp_name popcs_ds_beta6_offline_pmf0_1230 --reset --infer
We also provide:
- the pre-trained model of DiffSinger;
- the pre-trained model of FFT-Singer for the shallow diffusion mechanism in DiffSinger;
Remember to put the pre-trained models in checkpoints
directory.
Note that:
- the original PWG version vocoder in the paper we used has been put into commercial use, so we provide this HifiGAN version vocoder as a substitute.
- we assume the ground-truth F0 to be given as the pitch information following [1][2][3]. If you want to conduct experiments on MIDI data, you need an external F0 predictor (like MIDI-A-version) or a joint prediction with spectrograms(like MIDI-B-version).
[1] Adversarially trained multi-singer sequence-to-sequence singing synthesizer. Interspeech 2020.
[2] SEQUENCE-TO-SEQUENCE SINGING SYNTHESIS USING THE FEED-FORWARD TRANSFORMER. ICASSP 2020.
[3] DeepSinger : Singing Voice Synthesis with Data Mined From the Web. KDD 2020.