--- license: cc-by-nc-4.0 language: - en --- Models trained from [VITS-fast-fine-tuning](https://github.com/Plachtaa/VITS-fast-fine-tuning) - Three speakers: laoliang (老梁), specialweek, zhongli. - The model is based on the C+J base model and trained on a single NVIDIA 3090 with 300 epochs. It takes about 4.5 hours in total. - During training, we use a single long audio of laoliang (~5 minutes) with auxiliary data as training data. How to run the model? - Follow [the official instruction](https://github.com/Plachtaa/VITS-fast-fine-tuning/blob/main/LOCAL.md), install required libraries. - Download models and move _finetune_speaker.json_ and _G_latest.pth_ to _/path/to/ VITS-fast-fine-tuning_. - Run _python VC_inference.py --model_dir ./G_latest.pth --share True_ to start a local gradio inference demo. File structure ```bash VITS-fast-fine-tuning ├───VC_inference.py ├───... ├───finetune_speaker.json └───G_latest.pth ```