Update README.md of svc (#2)

Browse files

- add guidance of vocoder and contentvec (8f4b4eb9cba11c6a3c615487aef7c823a9a58efc)

Co-authored-by: Zihao Fang <WelkinFang@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +19 -7

README.md CHANGED Viewed

@@ -27,27 +27,38 @@ We provide a [DiffWaveNetSVC](https://github.com/open-mmlab/Amphion/tree/main/eg
 To make these singers sing the songs you want to listen to, just run the following commands:
-### Step1: Download the checkpoint
 ```bash
 git lfs install
 git clone https://huggingface.co/amphion/singing_voice_conversion
 ```
-### Step2: Clone the Amphion's Source Code of GitHub
 ```bash
 git clone https://github.com/open-mmlab/Amphion.git
 ```
-### Step3: Specify the checkpoint's path
-Use the soft link to specify the downloaded checkpoint in first step:
 ```bash
 cd Amphion
-mkdir ckpts/svc
-ln -s ../singing_voice_conversion/vocalist_l1_contentvec+whisper ckpts/svc/vocalist_l1_contentvec+whisper
 ```
-### Step4: Conversion
 You can follow [this recipe](https://github.com/open-mmlab/Amphion/tree/main/egs/svc/MultipleContentsSVC#4-inferenceconversion) to conduct the conversion. For example, if you want to make Taylor Swift sing the songs in the `[Your Audios Folder]`, just run:
@@ -57,6 +68,7 @@ sh egs/svc/MultipleContentsSVC/run.sh --stage 3 --gpu "0" \
 	--infer_expt_dir "ckpts/svc/vocalist_l1_contentvec+whisper" \
 	--infer_output_dir "ckpts/svc/vocalist_l1_contentvec+whisper/result" \
 	--infer_source_audio_dir [Your Audios Folder] \
 	--infer_target_speaker "vocalist_l1_TaylorSwift" \
 	--infer_key_shift "autoshift"
 ```

 To make these singers sing the songs you want to listen to, just run the following commands:
+### Step1: Download the acoustics model checkpoint
 ```bash
 git lfs install
 git clone https://huggingface.co/amphion/singing_voice_conversion
 ```
+### Step2: Download the vocoder checkpoint
+```bash
+git clone https://huggingface.co/amphion/BigVGAN_singing_bigdata
+```
+### Step3: Clone the Amphion's Source Code of GitHub
 ```bash
 git clone https://github.com/open-mmlab/Amphion.git
 ```
+### Step4: Download ContentVec Checkpoint
+You could download **ContentVec** Checkpoint from [this repo](https://github.com/auspicious3000/contentvec). In this pretrained model, we used `checkpoint_best_legacy_500.pt`, which is the legacy ContentVec with 500 classes.
+### Step5: Specify the checkpoints' path
+Use the soft link to specify the downloaded checkpoints:
 ```bash
 cd Amphion
+mkdir -p ckpts/svc
+ln -s "$(realpath ../singing_voice_conversion/vocalist_l1_contentvec+whisper)" ckpts/svc/vocalist_l1_contentvec+whisper
+ln -s "$(realpath ../BigVGAN_singing_bigdata/bigvgan_singing)" pretrained/bigvgan_singing
 ```
+Also, you need to move `checkpoint_best_legacy_500.pt` you downloaded at **Step4** into `Amphion/pretrained/contentvec`.
+### Step6: Conversion
 You can follow [this recipe](https://github.com/open-mmlab/Amphion/tree/main/egs/svc/MultipleContentsSVC#4-inferenceconversion) to conduct the conversion. For example, if you want to make Taylor Swift sing the songs in the `[Your Audios Folder]`, just run:
 	--infer_expt_dir "ckpts/svc/vocalist_l1_contentvec+whisper" \
 	--infer_output_dir "ckpts/svc/vocalist_l1_contentvec+whisper/result" \
 	--infer_source_audio_dir [Your Audios Folder] \
+    --infer_vocoder_dir "pretrained/bigvgan_singing" \
 	--infer_target_speaker "vocalist_l1_TaylorSwift" \
 	--infer_key_shift "autoshift"
 ```