TTS-AGI/TTS-Arena · Reviews: February models

Mar 9

Post your reviews of the February TTS models here. The models are:

Mar 9

I wanted to come out bashing WhisperSpeech and Pheme. But actually they all have their own pros and cons.

Short review.

ElevenLabs - Super clear, studio quality, even after being downsampled to 24kHz by TTS-Arena. Loses to others if delivery is more monotone than competitor.
XTTSv2 - Clear, but not the best voice. Great narration. Sometimes cuts a part of a word by the end of sentence. Overall gives quality close to ElevenLabs.
OpenVoice - Clear, but often monotone.
MetaVoice - Muffled voice. Can hallucinate at the end of sample.
WhisperSpeech - Low stability and can have cutoffs and hallucinate at the end of sample.
Pheme - Very bad voice quality. Very unstable, cutoffs... however... I finally understand those Harvard sentences and why they have so few commas. There are times when Pheme correctly pauses mid-sentence, making the sentence more comprehensible. ElevenLabs never does, plows right through.

For me, I have to automatically choose the competitor when WhisperSpeech or MetaVoice hallucinates something when it needs to be silent.

Pendrokar changed discussion title from February model personal reviews to Reviews: February models Mar 26

Mar 26

Severe case of Meta voice hallucinating:

Frankly some of the low performing models should be disabled until they get updated/fixed.

Pendrokar changed discussion status to closed Sep 27