Tortoise TTS generate voice that read in syllables and sounds not so close to examples

#1
by drewdru - opened

Hello. I'm trying to use your model with tortoise-tts but it sounds odd.
I added ruslan.pth to the .model directory and tried to run it with this command:

python tortoise/do_tts.py --text "Мне всегда готовы предоставить работу, которая обеспечит нормальное биологическое существование." --voice ruslan --preset fast --model_dir .model

and with this Google colab

I also tried to use it with preset high_quality, and use it with 1000 examples in tortoise/voices/ruslan
But it doesn't help at all.
How can I improve the quality of the generated voice with this model? Should I use special parameters for do_tts?

Hey, @drewdru ! Here's a sample that I quickly got with random samples using https://git.ecker.tech/mrq/ai-voice-cloning/ .

Screenshot 2023-08-08 at 10.51.30 pm.png

Are you getting results similar to the above?

The result wasn't as good as yours.

Where should I put ruslan.pth? Is this a right model path: ai-voice-cloning/models/tortoise/ruslan.pth?
After generation I got extra model: ai-voice-cloning/voices/ruslan/cond_latents_d1f79232.pth?
How many audio files do you use in voices directory? If you use all dataset files here, can you provide this model too?

@drewdru could you confirm that you're selecting the right model in the settings? You need to select it and then click "Reload TTS".

This comment has been hidden

Thank you ^o^
I added ruslan.pth to models/finetunes/ruslan.pth, On settings tab selected it as Autoregressive Model, selected "Model (Re)Load TTS".
Now it works great

Sign up or log in to comment