Both are good and Both are bad option?

#29
by kexul - opened

Could we have a both are bad and both are good option? So that the gap between good model and bad model could be larger. Now it seems like their quality is similar.

The only cases where this could apply is that one model does a certain thing good/bad while the other does another certain thing good/bad. But you still have to decide which one did better.

P.S. ElevenLabs audio quality is practically always better due to the 44kHz samples. And I find Pheme having a generally bad audio quality as well as the voice itself, which is why it rarely gets picked. Super rare for me to prefer Pheme's sample over any other model. As well as the glitching at the end of a sample that often occurs for WhisperSpeech, makes it hard to appreciate if it did anything better than the competitor. ๐Ÿ˜•

TTS AGI org

@Pendrokar all samples are resembled to 22kHz for fair comparison

@mrfakename does it? I downloaded a sample and VLC says it is 44 kHz and Audacity says the actual playback rate is 48 kHz. Here is the sample and a resample to 22khz:

TTS AGI org

Ah, maybe there was an issue with the resampling. Let me check the code

Ah, maybe there was an issue with the resampling. Let me check the code

Works now! โš–

The only cases where this could apply is that one model does a certain thing good/bad while the other does another certain thing good/bad. But you still have to decide which one did better.

This can increase the discriminative power of different models, which has been the best practice in the chatbot arena. We are not just evaluating two models, but a large number of models.

I changed my mind. If both TTS models mispronounce a word or a common acronym such as "ETA", then "Both are bad" fits. Also it is still unclear if American English is what is being evaluated.

Sign up or log in to comment