Jason-Lu
/

Laoliang-voice-clone

Inference Endpoints

Model card Files Files and versions Community

Laoliang-voice-clone / README.md

Jason-Lu's picture

Update README.md

5ca640c 9 months ago

|

history blame contribute delete

968 Bytes

	---
	license: cc-by-nc-4.0
	language:
	- en
	---
	Models trained from [VITS-fast-fine-tuning](https://github.com/Plachtaa/VITS-fast-fine-tuning)
	- Three speakers: laoliang (老梁), specialweek, zhongli.
	- The model is based on the C+J base model and trained on a single NVIDIA 3090 with 300 epochs. It takes about 4.5 hours in total.
	- During training, we use a single long audio of laoliang (~5 minutes) with auxiliary data as training data.

	How to run the model?
	- Follow [the official instruction](https://github.com/Plachtaa/VITS-fast-fine-tuning/blob/main/LOCAL.md), install required libraries.
	- Download models and move _finetune_speaker.json_ and _G_latest.pth_ to _/path/to/ VITS-fast-fine-tuning_.
	- Run _python VC_inference.py --model_dir ./G_latest.pth --share True_ to start a local gradio inference demo.

	File structure
	```bash
	VITS-fast-fine-tuning
	├───VC_inference.py
	├───...
	├───finetune_speaker.json
	└───G_latest.pth
	```