Add Kokoro, the #1๐Ÿฅ‡ TTS Model in TTS-Spaces-Arena ๐Ÿ† with only 82M params ๐Ÿค

#70
by hexgrad - opened

Hello, I'd like to request that https://hf.co/spaces/hexgrad/Kokoro-TTS is added to this Arena.

Kokoro is only 82M params. The weights are currently private but its StyleTTS2 architecture is open.

At the time of this post, Kokoro is internally versioned at v0.19 (checkpoint from 22 Nov 2024), and ranks ๐Ÿฅ‡ on @Pendrokar 's https://hf.co/spaces/Pendrokar/TTS-Spaces-Arena over:
2. Microsoft's EdgeTTS (? params)
3. XTTS v2 (467M params)
4. MetaVoice-1B (1B params)
5. Parler Mini (880M params)

At v0.19, Kokoro might not be as flexible as some of these larger models in voice cloning or language support (yet), but much like an NBA 3-point specialist (e.g. Ray Allen, Kyle Korver), Kokoro really excels at its strengths, delivering high Elo, precise English speech.

I understand everyone wants their TTS model listed in this Arena. But Kokoro stands out from the rest since it is already a proven contender in another Arena, does more with less, and can be accessed immediately via a semi-private Gradio API.

Feel free to DM @rzvzn on Discord to coordinate. I have also DM'd @mrfakename

Screenshot 2024-12-07 at 11.13.22โ€ฏAM.png

"Kokoro is only 82M params. The weights are currently private but its StyleTTS2 architecture is open."

"Kokoro is only 82M params. The weights are currently private but its StyleTTS2 architecture is open."

@lengyue233 Yes ๐Ÿ˜Š Feel free to audit the inference code in https://hf.co/spaces/hexgrad/Kokoro-TTS/tree/main

@mrfakename @reach-vb @Steveeeeeeen

Edit: this particular voice is actually v0.22x, still a WIP and a bit shaky, but happy to submit anything from v0.19 and up, whatever gets me in the door.

Hey !

Thanks for notifying us. We will add the model alongside the others requested in the coming days or weeks.

Sign up or log in to comment