Questions about the creation of voices or Vits models.

#22
by RACR - opened

Excuse me. I want to ask. What application do you have to use to make a Vits voice file? Is it the same whether the Vits voice file with the PTH format can be changed or can it be made through another application. The problem is I want to learn about making Vits voices or voice models in their own PTH format. Thank You.

A PTH file is a machine learning model created using PyTorch.
If you want to add more characters you like, you can try to train your own model.
See https://github.com/CjangCjengh/vits

skytnt changed discussion status to closed

And to be able to install Pytorch?. Is it through the command prompt or through the Python application to install Pytorch?

RACR changed discussion status to open

First you need to install CUDA. Then install pytoch.See guides on https://pytorch.org/.
You can also search How to install CUDA and Pytorch

How can I download the model files?

Quick question: Why doe all the voices (also the one with English names) in the large pack sound weird when they speak English? Is this a problem of under- or overtraining or VITS models in general? Or were they all trained on Japanese/Chinese audio? I am trying to figure out what is needed to create a good voice model. Some told me I need at least 3-6h of transcribed, clean audio. So I have kind of given up on creating a loli voice for my AI assistant. Or do you know of any really high quality voice models out there that were trained only on English anime voice actors?

Sign up or log in to comment