Hexgrad PRO

hexgrad

AI & ML interests

None yet

Recent Activity

Articles

Organizations

None yet

hexgrad's activity

posted an update about 23 hours ago
view post
Post
1563
Merry Christmas! πŸŽ„ Open sourced a small TTS model at hexgrad/Kokoro-82M
  • 1 reply
Β·
posted an update 18 days ago
view post
Post
1038
πŸš€ Shipmas Day 2.5 πŸš€ Kokoro v0.22 packs 5 languages in 82M params! πŸ‡ΊπŸ‡ΈπŸ‡¬πŸ‡§πŸ‡«πŸ‡·πŸ‡―πŸ‡΅πŸ‡°πŸ‡·πŸ‡¨πŸ‡³ hexgrad/Kokoro-TTS

Feedback appreciated, both positive or negative. Non-English languages haven't been validated by the model creator(s), so if you're a native speaker, criticize away!

γ€Œγ‚³γ‚³γƒ­γƒ†γ‚£γƒΌγƒ†γ‚£γƒΌγ‚¨γ‚Ήγ―γ€θ‹±θͺžγ¨ζ—₯本θͺžγ«εŠ γˆγ¦γ€δΈ­ε›½θͺžγ€ιŸ“ε›½θͺžγ€γƒ•γƒ©γƒ³γ‚Ήθͺžγ‚’θ©±γ™γ“γ¨γŒγ§γγ‚‹γ‚ˆγ†γ«γͺγ‚ŠγΎγ—γŸγ€‚γ€

Wav converted to mp4 using FFmpeg, since audio attachments aren't allowed in Posts. You may have to unmute the video.
replied to their post 26 days ago
view reply

The voice quality actually sounds close to ElevenLabs.

I might've mentioned this elsewhere, but if you plug Kokoro outputs for named ElevenLabs voices into https://elevenlabs.io/ai-speech-classifier you should get very reliable positives (98% confident generated by ElevenLabs).

By ear, I think Kokoro is indeed close to ElevenLabs, especially on certain voices. For Nicole, they are indistinguishable to me. Michael is pretty close; Adam is still somewhat weak.

But StyleTTS usually is not very emotional.

I agree. Kokoro also has 2 specific issues in this area: (1) little to no emotional audio seen during training, and (2) even if there was, the stock voices are average style vectors over 10-100 samples, creating an average/neutral style anyway.

posted an update 26 days ago
view post
Post
2898
self.brag(): Kokoro finally got 300 votes in Pendrokar/TTS-Spaces-Arena after @Pendrokar was kind enough to add it 3 weeks ago.
Discounting the small sample size of votes, I think it is safe to say that hexgrad/Kokoro-TTS is currently a top 3 model among the contenders in that Arena. This is notable because:
- At 82M params, Kokoro is one of the smaller models in the Arena
- MeloTTS has 52M params
- F5 TTS has 330M params
- XTTSv2 has 467M params
Β·
replied to fdaudens's post 28 days ago
view reply

I used ffmpeg to make the video:

ffmpeg -i input.wav -r 25 -filter_complex "[0:a]compand,showwaves=size=400x400:colors=#ffd700:draw=full:mode=line,format=yuv420p[vout]" -map "[vout]" -map 0:a -c:v libx264 -c:a aac output.mp4
posted an update 29 days ago
view post
Post
1338
@Respair just dropped Tsukasa: frontier TTS in Japanese Respair/Tsukasa_Speech
It's expressive, punches way above its weight class, and supports voice cloning. Go check it out! πŸš€
(Unmute the audio sample below after hitting play)
replied to fdaudens's post 29 days ago
reacted to fdaudens's post with πŸ‘ 29 days ago
view post
Post
1011
The rapid progress in small audio models is mind-blowing! 🀯 Just tested OuteTTS v0.2 - cloned my voice from a 10s clip with impressive accuracy and natural prosody.

At 500M parameters, it's efficient enough to run on basic hardware but powerful enough for professional use.

This could transform how we produce audio content for new - think instant translated interviews keeping original voices, or scaled audio article production!

Demo and Model on the Hub: OuteAI/OuteTTS-0.2-500M h/t @reach-vb
  • 3 replies
Β·
replied to Pendrokar's post about 1 month ago
view reply

This is conjecture, but it's possible the voice sample for XTTS is in-distribution, i.e. seen during training, and if so you'd expect it to perform better than F5 given the same reference. No knock on XTTS btw, Kokoro is equally guilty for thisβ€”the voice used in the Arena is also in-distribution.

It would not be surprising to me if voice cloning is simply "looking up" the most similar speaker or interpolation of speakers seen in training. François Chollet has discussed this phenomenon many times wrt LLMs, and I highly recommend to listening to his talks.

https://hf.co/spaces/hexgrad/Kokoro-TTS/discussions/3#6744bdea8c689a7071742134

posted an update about 1 month ago
view post
Post
1690
hexgrad/Kokoro-TTS just got an upgrade that substantially improves TTS naturalness for short bursts while maintaining parity for longer utterances! πŸ”₯

Read more and listen to before/after audio samples at https://hf.co/blog/hexgrad/kokoro-short-burst-upgrade

(Probably would have made that Article a Post instead, if audio could be embedded into Posts.)
  • 2 replies
Β·
posted an update about 1 month ago