TTS Voice Clone Error
Hello,
I'd like to report an issue with the text-to-speech (TTS) library. After test the "Merhaba" (Hello), the TTS output is saying things from the reference audio recording instead of the content I provided in the input text.
Thank you for your assistance in resolving this issue.
Ahh so it isn't just me. I have experienced this as well. Because my first step was to read the documentation in full, I am aware that the recommendation for the reference is:
-6 seconds or more of audio reference
-No background noise/mic bumps etc
-Cleaner audio file
-No big silences on reference audio (at start and end especially)
As an audio engineer, I have ensured that this is the case. That's all I can say to make you believe me. Yet, I consistently struggle with both Turkish and Portuguese (the two that I've been trying to use). I have a couple other things that I want to try to tackle it but I can definitely say that I am experience a language bias.
I really hope that someone that cares about the other languages, perhaps from Coqui, would give something more than a copy-paste of documentation that I can look up myself. Gorkem seems to be the one on the socials that is the closest to who we need.
But, erenfazlioglu, it is very likely that we'll have to solve this ourselves. I have an idea; I'm just working on another project right now. When I test it, I'll get back to you. If you have to know-how to test it without being guided, it's basically a small-scale implementation of https://browse.arxiv.org/pdf/2309.08255.pdf
I'm eager to explore your suggested solution and will keep in touch.
Best regards
@Feanix @erenfazlioglu -- join the Discord, and check out the #research channel... I think you'll find some good collaboration to move forward on this: https://discord.gg/EVnRWFY9nX
@Feanix
@erenfazlioglu
issue you discribed is solved in V2 version of model
https://huggingface.co/coqui/XTTS-v2