Spaces:
Running
on
CPU Upgrade
Add a tab for comparing voice cloning between models
Greetings,
Several of the models you are using support voice cloning. Please add a tab in the arena where we can upload audio samples and compare the voice cloning outputs between the models.
Thank you
True, all the current TTS-Arena models support instant voice cloning.
https://github.com/Pendrokar/open-tts-tracker/blob/patch-3/README.md#capability-specifics
Though I don't see a need for human input to evaluate clones. ๐
True, all the current TTS-Arena models support instant voice cloning.
https://github.com/Pendrokar/open-tts-tracker/blob/patch-3/README.md#capability-specificsThough I don't see a need for human input to evaluate clones. ๐
Besides voice pitch, a good model would also copy the speech patterns, accent and mannerisms from the input audio sample. And evaluating these with AI is difficult atm and human evaluation would be good.
what voice prompt are currently used?
Besides voice pitch, a good model would also copy the speech patterns, accent and mannerisms from the input audio sample. And evaluating these with AI is difficult atm and human evaluation would be good.
This process can be automated by having the voice clone synthesize the text of a sample from the original voice that it not part of the dataset. Do that multiple times and then compare the spectrograms to the original sample. The TTS with the least amount deviations wins. No human voting required.
what voice prompt are currently used?
You mean the voices samples used for instant voice cloning? No clue.
Besides voice pitch, a good model would also copy the speech patterns, accent and mannerisms from the input audio sample. And evaluating these with AI is difficult atm and human evaluation would be good.
This process can be automated by having the voice clone synthesize the text of a sample from the original voice that it not part of the dataset. Do that multiple times and then compare the spectrograms to the original sample. The TTS with the least amount deviations wins. No human voting required.
what voice prompt are currently used?
You mean the voices samples used for instant voice cloning? No clue.
People need to benchmark on samples not present in the dataset