File size: 3,977 Bytes

---
license: mit
---
# Amphion Singing Voice Conversion Pretrained Models

## Quick Start

We provide a [DiffWaveNetSVC](https://github.com/open-mmlab/Amphion/tree/main/egs/svc/MultipleContentsSVC) pretrained checkpoint for you to play. Specially, it is trained under the  real-world vocalist data (total duration: 6.16 hours), including the following 15 professional singers:

|       Singer        | Language | Training Duration (mins) |
| :-----------------: | :------: | :----------------------: |
|   David Tao 陶喆    | Chinese  |          45.51           |
|  Eason Chan 陈奕迅  | Chinese  |          43.36           |
|   Feng Wang 汪峰    | Chinese  |          41.08           |
|    Jian Li 李健     | Chinese  |          38.90           |
|     John Mayer      | English  |          30.83           |
|        Adele        | English  |          27.23           |
|    Ying Na 那英     | Chinese  |          27.02           |
|  Yijie Shi 石倚洁   | Chinese  |          24.93           |
| Jacky Cheung 张学友 | Chinese  |          18.31           |
|    Taylor Swift     | English  |          18.31           |
|   Faye Wong 王菲    | English  |          16.78           |
|   Michael Jackson   | English  |          15.13           |
|   Tsai Chin 蔡琴    | Chinese  |          10.12           |
|     Bruno Mars      | English  |           6.29           |
|       Beyonce       | English  |           6.06           |

To make these singers sing the songs you want to listen to, just run the following commands:

### Step1: Download the acoustics model checkpoint
```bash
git lfs install
git clone https://huggingface.co/amphion/singing_voice_conversion
```

### Step2: Download the vocoder checkpoint
```bash
git clone https://huggingface.co/amphion/BigVGAN_singing_bigdata
```

### Step3: Clone the Amphion's Source Code of GitHub
```bash
git clone https://github.com/open-mmlab/Amphion.git
```

### Step4: Download ContentVec Checkpoint
You could download **ContentVec** Checkpoint from [this repo](https://github.com/auspicious3000/contentvec). In this pretrained model, we used `checkpoint_best_legacy_500.pt`, which is the legacy ContentVec with 500 classes. 

### Step5: Specify the checkpoints' path
Use the soft link to specify the downloaded checkpoints:

```bash
cd Amphion
mkdir -p ckpts/svc
ln -s "$(realpath ../singing_voice_conversion/vocalist_l1_contentvec+whisper)" ckpts/svc/vocalist_l1_contentvec+whisper
ln -s "$(realpath ../BigVGAN_singing_bigdata/bigvgan_singing)" pretrained/bigvgan_singing
```

Also, you need to move `checkpoint_best_legacy_500.pt` you downloaded at **Step4** into `Amphion/pretrained/contentvec`.

### Step6: Conversion

You can follow [this recipe](https://github.com/open-mmlab/Amphion/tree/main/egs/svc/MultipleContentsSVC#4-inferenceconversion) to conduct the conversion. For example, if you want to make Taylor Swift sing the songs in the `[Your Audios Folder]`, just run:

```bash
sh egs/svc/MultipleContentsSVC/run.sh --stage 3 --gpu "0" \
	--config "ckpts/svc/vocalist_l1_contentvec+whisper/args.json" \
	--infer_expt_dir "ckpts/svc/vocalist_l1_contentvec+whisper" \
	--infer_output_dir "ckpts/svc/vocalist_l1_contentvec+whisper/result" \
	--infer_source_audio_dir [Your Audios Folder] \
    --infer_vocoder_dir "pretrained/bigvgan_singing" \
	--infer_target_speaker "vocalist_l1_TaylorSwift" \
	--infer_key_shift "autoshift"
```

**Note**: The supported `infer_target_speaker` values can be seen [here](https://huggingface.co/amphion/singing_voice_conversion/blob/main/vocalist_l1_contentvec%2Bwhisper/singers.json).  

## Citaions

```bibtex
@article{zhang2023leveraging,
  title={Leveraging Content-based Features from Multiple Acoustic Models for Singing Voice Conversion},
  author={Zhang, Xueyao and Gu, Yicheng and Chen, Haopeng and Fang, Zihao and Zou, Lexiao and Xue, Liumeng and Wu, Zhizheng},
  journal={Machine Learning for Audio Worshop, NeurIPS 2023},
  year={2023}
}
```