Spaces:

TTS-AGI
/

TTS-Arena

Running on CPU Upgrade

App Files Files Community

Problem with OpenVoice

#19

by ZenQin - opened Feb 25

Discussion

ZenQin

Feb 25

•

edited Feb 25

Hello - Thanks for making this project. It's very useful for evaluating TTS model from a human perception perspective.

I am the author of OpenVoice. I noticed that the OpenVoice samples in the comparisons have some artifacts, and I would love to help you get the right configuration and fix the bugs.

One problem might be the reference audio you used is not clean enough or less than 30s. Could you try some clean voices such as
https://aiartes.com/records/aerith_original.mp3
or https://aiartes.com/records/aloy_original.mp3
or some other samples in https://aiartes.com/voiceai.

If OpenVoice has a clean reference audio as input, it should be able to generate very high-quality audio.
Very much appreciated.

mrfakename

TTS AGI org Feb 25

I will look into this, thank you for letting us know!

mrfakename

TTS AGI org Feb 25

Hi, I updated the reference voices. Please let me know if it works!

reach-vb

TTS AGI org Feb 25

Thanks @mrfakename ❤️

julien-c

Feb 26

and thanks for chiming in here, @ZenQin 🤗

ZenQin

Feb 26

•

edited Feb 26

I listened to a couple of examples and there're still some artifacts. Could you use this one instead? @mrfakename

mrfakename

TTS AGI org Feb 26

Hi, the audio now uses that sample. Please let me know if it works now!

ZenQin

Feb 27

Thanks. It seems that the current version of OpenVoice tends to have artifacts for short sentences. We will fix this issue and let you know when done

Pendrokar

Feb 27

•

edited Feb 27

Wait... hold up... @mrfakename you, @reach-vb and the team should really define the rules as these are all copyrighted samples used for insta-cloning. Everyone might as well just fine-tune their voice model against Scarlett Johanson from the movie "Her" and this would be a competition of the best impersonator. 😑

[Edit] Though @reach-vb did clarify "In general, we're biased towards recent + open access models which have been trained on more than just LJSpeech or VCTK.", but I just didn't think it would mean training voices on copyrighted materials. 😐

@ZenQin also all the posted samples don't have a pause between sentences. Meaning samples of the game audio were merged into one without padding. So I would expect artifacts.

ZenQin

Feb 29

@mrfakename Could you help change to this reference voice instead? It should make the model work more stably for now. Very much appreciated. We will fix the issue I mentioned in my previous reply in OpenVoice V2 and improve the overall quality

Pendrokar

Feb 29

Zen, now you are definitely trolling. 😅

mrfakename

TTS AGI org Feb 29

@ZenQin Updated! Here's an example generation:

ZenQin

Mar 1

Thanks! I'll close the issue for now

ZenQin changed discussion status to closed Mar 1

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment