π Pendrokar/TTS-Spaces-Arena
Orpheus has been able to get into the Top 5 of available models.
Orpheus-TTS π: MohamedRashad/Orpheus-TTS
Conversational Speech Modelπ±: sesame/csm-1b
Did you not notice that each ZeroGPU space has 12 or was it 24 core server-type CPU? That is more powerful than what you get with a CPU-Upgrade space. And you get 10 for $9!!! A bargain!
Which model out of the 8 models listed on my post?
pip install kokoro
, and still 82M parameters.After 4000 votes F5 TTS fell near the bottom of the leaderboard, I extracted some sample from Emilia. Let us see if that changes anything.
The original Arena's threshold is at 700 votes. But I am sure Kokoro will hold the position. The voice quality actually sounds close to ElevenLabs.
But StyleTTS usually is not very emotional. So it will fail where Edge TTS does. The phrases where the voice has to be sad or angry. For example Parler Expresso was overly jolly.
self.brag():
Kokoro finally got 300 votes in
Pendrokar/TTS-Spaces-Arena after
@Pendrokar
was kind enough to add it 3 weeks ago.True, a sample from the original dataset would probably be the best. My attempt to try to fetch one from Emilia dataset was unsuccessful as HF dataset viewer can only show the German samples. Emilia's homepage has a ASMR-y example prompt given.
True about the narration style sample, but that still did not stop XTTS in surpassing F5. Both use the same sample.