Reviews: February models

#36
by Pendrokar - opened

Post your reviews of the February TTS models here. The models are:

  • ElevenLabs
  • XTTSv2
  • OpenVoice
  • MetaVoice
  • WhisperSpeech
  • Pheme

I wanted to come out bashing WhisperSpeech and Pheme. But actually they all have their own pros and cons.

Short review.

  • ElevenLabs - Super clear, studio quality, even after being downsampled to 24kHz by TTS-Arena. Loses to others if delivery is more monotone than competitor.
  • XTTSv2 - Clear, but not the best voice. Great narration. Sometimes cuts a part of a word by the end of sentence. Overall gives quality close to ElevenLabs.
  • OpenVoice - Clear, but often monotone.
  • MetaVoice - Muffled voice. Can hallucinate at the end of sample.
  • WhisperSpeech - Low stability and can have cutoffs and hallucinate at the end of sample.
  • Pheme - Very bad voice quality. Very unstable, cutoffs... however... I finally understand those Harvard sentences and why they have so few commas. There are times when Pheme correctly pauses mid-sentence, making the sentence more comprehensible. ElevenLabs never does, plows right through.

For me, I have to automatically choose the competitor when WhisperSpeech or MetaVoice hallucinates something when it needs to be silent.

Pendrokar changed discussion title from February model personal reviews to Reviews: February models

Severe case of Meta voice hallucinating:

Frankly some of the low performing models should be disabled until they get updated/fixed.

Sign up or log in to comment