I'd help with your mission, if you truly believed in it. (~rant)

#3
by Pendrokar - opened

Decentralized yet unified efforts to accelerate research for Open Text to Speech (TTS) systems!

I forked your TTS Arena space, which only uses Gradio Client to fetch the required parameters and synthesizes the audio in public HF Spaces, unlike the current resource limited TTS router.
https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena

I would add the features that I added there as pull requests if you're willing to become more open.

Many things are covered in shadow. The router, the TTS settings hidden there, the collected vote dataset. All this makes it hard for anyone to contribute to the Arena code. It seems @mrfakename is the only one really contributing to it. Seems another fork of the Arena also took my path of using Gradio API to generate the audio samples:
https://huggingface.co/spaces/kotoba-tech/TTS-Arena-JA

Yet, @reach-vb denied my pull request about Gradio API without comment. 🤨
https://huggingface.co/spaces/TTS-AGI/TTS-Arena/discussions/30

I guess TTS-AGI doesn't allow access to the collected votes dataset due to user submitted troll requests that got passed the Detoxify test. But even if you did allow access, from the database structure I see that it would not help TTS developers as the votes are detached from the text voted on. A TTS developer has no way of knowing why their model gets downvoted. 😵

About the TTS candidates that TTS-AGI chooses. You've added 8 to the original 6. No clue how TTS-AGI decided which to choose as there was no strong push within the Arena's discussion forums. But among the Open TTS, only 2 of the new ones managed to get passed the original top three, StyleTTS 2 and MeloTTS. The other 5 Open TTS models, went heavily below 1200 rating.

I've tried to get clarification... to no avail.
https://huggingface.co/spaces/TTS-AGI/TTS-Arena/discussions/15

Pinging other users that are involved in TTS-AGI and audio: @ylacombe @sanchit-gandhi @pyp1

Are you willing to become more open, so can move closer to finding the best TTS models?

End of rant, start of propositions.

Could the casted vote dataset be made public if all the user submitted spoken text be removed?

Though as I mentioned previously, the collected text within the spokentext table is near useless as the only other columns there are the unconnected ID and timestamp. Now someone could get some data based on timestamp when a vote got cast and the spokentext entry timestamp. But that would be very unreliable.

Sign up or log in to comment