hexgrad/Kokoro-82M · How to finetune on new language?

Chan-Y

4 days ago

I want to finetune on Turkish voice

eschmidbauer

3 days ago

read section on Where is Voice Cloning? in Philosphy

hexgrad

Owner 3 days ago

•

edited 3 days ago

Yes, that. Also, I have not yet posted recipes for continuing training off the checkpoint uploaded to this repo. I might need to update the Philosophy to include something along the lines of:

Currently, Kokoro is packaged & delivered to you as an end product meant to be used & deployed.

That could change later, but no promises.

However, it should be fairly transparent that Kokoro uses a StyleTTS 2 architecture, which is FOSS/MIT, therefore rolling your own model is always an option. Multilingual STTS2 models can be and have been trained. A big hurdle is finding a good g2p solution for your language—I only speak English so this is a very tight bottleneck, that and data sourcing.

For StyleTTS2:

@Respair has trained a Japanese model, Tsukasa and also hosted a demo Space
@patriotyk has trained a Ukrainian model: https://hf.co/spaces/patriotyk/styletts2-ukrainian
Although the above 2 are dedicated monolingual models, I do not think there is a hard constraint preventing you from stuffing many languages into 1 model. In the Kokoro TTS Space, v0.23 shares 82M params across 5 languages: English, French, Japanese, Korean, Chinese.

Edit, other multilingual STTS2 models I'm aware of:

I believe Respair has also done Persian
Someone else (can't find their username on HF right now) has done Korean
Another person has done 5-way multilingual: English, German, French, Italian and Spanish

Without gatekeeping, I should warn you that training models (especially STTS) is not for the faint of heart, and could take substantial compute/time/experience—all of which are obtainable—to produce good outcomes. I believe XTTS v2 might support Turkish out-of-the-box, but I have not tried it.

hexgrad changed discussion status to closed 3 days ago