Procedure to add new style-prompted TTS models to the Arena

#1
by ajd12342 - opened

Thanks for taking the initiative to create this expressive TTS arena! Currently the arena supports only 2 TTS models served by Hume AI and ElevenLabs. Do you plan to add support for other style-prompted TTS models? I recently released a model https://huggingface.co/ajd12342/parler-tts-mini-v1-paraspeechcaps that is one of the first open-source models to support a large set of style tags. This model is Parler-TTS finetuned on my large-scale ParaSpeechCaps dataset (https://arxiv.org/abs/2503.04713). If you plan to add new models to expand the scope of this arena, I would be happy to make a pull request or work with your team to get this model added; please let me know what your procedure is! Thank you.

This space feels scammy. My guess is they're cherry picking outputs to make their own model look better.
image.png

This space feels scammy. My guess is they're cherry picking outputs to make their own model look better.
image.png

@clayshoaf Thanks for raising your concerns about the Expressive TTS Arena.

To clarify our approach: we default to Hume-to-Hume comparisons for manually inputted text to avoid inconsistencies that might arise from formatting differences—especially ones that could unintentionally favor certain models. (Hume to Hume comparison data is not included in the leaderboard stats)

You're welcome to provide any character description; that description is passed to Claude, which generates the input text that is then sent uniformly to all TTS models. This keeps the formatting natural and consistent across evaluations.

This setup also helps ensure the arena is used for its intended purpose—evaluating expressiveness of model outputs—rather than functioning as a general-purpose TTS tool.

The entire codebase is public on GitHub. If there are specific parts that seem like they could bias the results, we'd really appreciate it if you could point them out.

Hume AI org

Thanks for taking the initiative to create this expressive TTS arena! Currently the arena supports only 2 TTS models served by Hume AI and ElevenLabs. Do you plan to add support for other style-prompted TTS models? I recently released a model https://huggingface.co/ajd12342/parler-tts-mini-v1-paraspeechcaps that is one of the first open-source models to support a large set of style tags. This model is Parler-TTS finetuned on my large-scale ParaSpeechCaps dataset (https://arxiv.org/abs/2503.04713). If you plan to add new models to expand the scope of this arena, I would be happy to make a pull request or work with your team to get this model added; please let me know what your procedure is! Thank you.

@ajd12342 Thank you for your interest in contributing to the arena! Since your original comment we have added OpenAI to the arena, and we have included additional head-to-head stats for the models on the leaderboard tab.

The main prerequisite for adding a model is that it must be accessible via an API that accepts both input text and a voice prompt to guide the delivery and characteristics of the generated speech.

Beyond that, the arena code is open source under the MIT license, so you’re absolutely welcome to fork the repository and build your own version with any models you’d like to include!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment