Problem with OpenVoice

#19
by ZenQin - opened

Hello - Thanks for making this project. It's very useful for evaluating TTS model from a human perception perspective.

I am the author of OpenVoice. I noticed that the OpenVoice samples in the comparisons have some artifacts, and I would love to help you get the right configuration and fix the bugs.

One problem might be the reference audio you used is not clean enough or less than 30s. Could you try some clean voices such as
https://aiartes.com/records/aerith_original.mp3
or https://aiartes.com/records/aloy_original.mp3
or some other samples in https://aiartes.com/voiceai.

If OpenVoice has a clean reference audio as input, it should be able to generate very high-quality audio.
Very much appreciated.

TTS AGI org

I will look into this, thank you for letting us know!

TTS AGI org

Hi, I updated the reference voices. Please let me know if it works!

TTS AGI org

Thanks @mrfakename โค๏ธ

and thanks for chiming in here, @ZenQin ๐Ÿค—

I listened to a couple of examples and there're still some artifacts. Could you use this one instead? @mrfakename

TTS AGI org

Hi, the audio now uses that sample. Please let me know if it works now!

Thanks. It seems that the current version of OpenVoice tends to have artifacts for short sentences. We will fix this issue and let you know when done

Wait... hold up... @mrfakename you, @reach-vb and the team should really define the rules as these are all copyrighted samples used for insta-cloning. Everyone might as well just fine-tune their voice model against Scarlett Johanson from the movie "Her" and this would be a competition of the best impersonator. ๐Ÿ˜‘

[Edit] Though @reach-vb did clarify "In general, we're biased towards recent + open access models which have been trained on more than just LJSpeech or VCTK.", but I just didn't think it would mean training voices on copyrighted materials. ๐Ÿ˜

@ZenQin also all the posted samples don't have a pause between sentences. Meaning samples of the game audio were merged into one without padding. So I would expect artifacts.

@mrfakename Could you help change to this reference voice instead? It should make the model work more stably for now. Very much appreciated. We will fix the issue I mentioned in my previous reply in OpenVoice V2 and improve the overall quality

Zen, now you are definitely trolling. ๐Ÿ˜…

TTS AGI org

@ZenQin Updated! Here's an example generation:

Thanks! I'll close the issue for now

ZenQin changed discussion status to closed

Sign up or log in to comment