Spaces:

TTS-AGI
/

TTS-Arena

Running on CPU Upgrade

App Files Files Community

Both are good and Both are bad option?

#29

by kexul - opened Mar 5

Discussion

kexul

Mar 5

•

edited Mar 5

Could we have a both are bad and both are good option? So that the gap between good model and bad model could be larger. Now it seems like their quality is similar.

Pendrokar

Mar 5

•

edited Mar 5

The only cases where this could apply is that one model does a certain thing good/bad while the other does another certain thing good/bad. But you still have to decide which one did better.

P.S. ElevenLabs audio quality is practically always better due to the 44kHz samples. And I find Pheme having a generally bad audio quality as well as the voice itself, which is why it rarely gets picked. Super rare for me to prefer Pheme's sample over any other model. As well as the glitching at the end of a sample that often occurs for WhisperSpeech, makes it hard to appreciate if it did anything better than the competitor. 😕

mrfakename

TTS AGI org Mar 5

@Pendrokar all samples are resembled to 22kHz for fair comparison

Pendrokar

Mar 5

@mrfakename does it? I downloaded a sample and VLC says it is 44 kHz and Audacity says the actual playback rate is 48 kHz. Here is the sample and a resample to 22khz:

mrfakename

TTS AGI org Mar 5

Ah, maybe there was an issue with the resampling. Let me check the code

Pendrokar

Mar 6

Ah, maybe there was an issue with the resampling. Let me check the code

Works now! ⚖

kexul

Mar 6

The only cases where this could apply is that one model does a certain thing good/bad while the other does another certain thing good/bad. But you still have to decide which one did better.

This can increase the discriminative power of different models, which has been the best practice in the chatbot arena. We are not just evaluating two models, but a large number of models.

Pendrokar

May 1

I changed my mind. If both TTS models mispronounce a word or a common acronym such as "ETA", then "Both are bad" fits. Also it is still unclear if American English is what is being evaluated.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment