How to finetune on new language?

by Chan-Y - opened

I want to finetune on Turkish voice

read section on Where is Voice Cloning? in Philosphy

Yes, that. Also, I have not yet posted recipes for continuing training off the checkpoint uploaded to this repo. I might need to update the Philosophy to include something along the lines of:

Currently, Kokoro is packaged & delivered to you as an end product meant to be used & deployed.

That could change later, but no promises.

However, it should be fairly transparent that Kokoro uses a StyleTTS 2 architecture, which is FOSS/MIT, therefore rolling your own model is always an option. Multilingual STTS2 models can be and have been trained. A big hurdle is finding a good g2p solution for your language—I only speak English so this is a very tight bottleneck, that and data sourcing.

For StyleTTS2:

Edit, other multilingual STTS2 models I'm aware of:

  • I believe Respair has also done Persian
  • Someone else (can't find their username on HF right now) has done Korean
  • Another person has done 5-way multilingual: English, German, French, Italian and Spanish

Without gatekeeping, I should warn you that training models (especially STTS) is not for the faint of heart, and could take substantial compute/time/experience—all of which are obtainable—to produce good outcomes. I believe XTTS v2 might support Turkish out-of-the-box, but I have not tried it.

hexgrad changed discussion status to closed

Sign up or log in to comment