Add a tab for comparing voice cloning between models

#14
by SilentAntagonist - opened

Greetings,

Several of the models you are using support voice cloning. Please add a tab in the arena where we can upload audio samples and compare the voice cloning outputs between the models.

Thank you

True, all the current TTS-Arena models support instant voice cloning.
https://github.com/Pendrokar/open-tts-tracker/blob/patch-3/README.md#capability-specifics

Though I don't see a need for human input to evaluate clones. ๐Ÿ˜•

True, all the current TTS-Arena models support instant voice cloning.
https://github.com/Pendrokar/open-tts-tracker/blob/patch-3/README.md#capability-specifics

Though I don't see a need for human input to evaluate clones. ๐Ÿ˜•

Besides voice pitch, a good model would also copy the speech patterns, accent and mannerisms from the input audio sample. And evaluating these with AI is difficult atm and human evaluation would be good.

what voice prompt are currently used?

Besides voice pitch, a good model would also copy the speech patterns, accent and mannerisms from the input audio sample. And evaluating these with AI is difficult atm and human evaluation would be good.

This process can be automated by having the voice clone synthesize the text of a sample from the original voice that it not part of the dataset. Do that multiple times and then compare the spectrograms to the original sample. The TTS with the least amount deviations wins. No human voting required.

what voice prompt are currently used?

You mean the voices samples used for instant voice cloning? No clue.

Besides voice pitch, a good model would also copy the speech patterns, accent and mannerisms from the input audio sample. And evaluating these with AI is difficult atm and human evaluation would be good.

This process can be automated by having the voice clone synthesize the text of a sample from the original voice that it not part of the dataset. Do that multiple times and then compare the spectrograms to the original sample. The TTS with the least amount deviations wins. No human voting required.

what voice prompt are currently used?

You mean the voices samples used for instant voice cloning? No clue.

People need to benchmark on samples not present in the dataset

Sign up or log in to comment