mrfakename PRO
AI & ML interests
Articles
Organizations
mrfakename's activity
Enter text and vote on which model is superior!
TTS-AGI/TTS-Arena
- OpenVoice V2
- Play.HT 2.0
๐๐ฏ๐ผ๐๐ ๐๐ต๐ฒ ๐ง๐ง๐ฆ ๐๐ฟ๐ฒ๐ป๐ฎ
The TTS Arena is an open sourced Arena where you can enter a prompt, have two models generate speech, and vote on which one is superior.
We compile the results from the votes into a automatically updated leaderboard to allow developers to select the best model.
We've already included models such as ElevenLabs, XTTS, StyleTTS 2, and MetaVoice. The more votes we collect, the sooner we'll be able to show these new models on the leaderboard and compare them!
๐ข๐ฝ๐ฒ๐ป๐ฉ๐ผ๐ถ๐ฐ๐ฒ ๐ฉ๐ฎ
OpenVoice V2 is an open-sourced speech synthesis model created by MyShell AI that supports instant zero-shot voice cloning. It's the next generation of OpenVoice, and is fully open-sourced under the MIT license.
https://github.com/myshell-ai/OpenVoice
๐ฃ๐น๐ฎ๐.๐๐ง ๐ฎ.๐ฌ
PlayโคHT 2.0 is a high-quality proprietary text-to-speech engine. Accessible through their API, this model supports zero-shot voice cloning.
๐๐ผ๐บ๐ฝ๐ฎ๐ฟ๐ฒ ๐๐ต๐ฒ ๐บ๐ผ๐ฑ๐ฒ๐น๐ ๐ผ๐ป ๐๐ต๐ฒ ๐ง๐ง๐ฆ ๐๐ฟ๐ฒ๐ป๐ฎ:
TTS-AGI/TTS-Arena
Anyone who's written a paper can post according to AK
The model was released over torrent, a method Mistral has recently often used for their releases. While the license has not been confirmed yet, a moderator on their Discord server yesterday suggested it was Apache 2.0 licensed.
Sources:
โข https://twitter.com/_philschmid/status/1778051363554934874
โข https://twitter.com/reach_vb/status/1777946948617605384
Curious to see how they compare with other leading models? Vote on the TTS Arena โฌ๏ธ
TTS-AGI/TTS-Arena
MeloTTS, released by MyShell AI, provides realistic and lifelike text to speech while remaining efficient and fast, even when running on CPU. It supports a variety of languages, including but not limited to English, French, Chinese, and Japanese.
StyleTTS 2 is another fully open sourced text to speech framework. It's permissively licensed, highly-efficient, and supports voice cloning and longform narration. It also provides natural and lifelike speech.
Both are available now to try on the TTS Arena - vote to find which one is better! The leaderboard will be revealed once we collect enough votes.
The filter should be more relaxed now, please let me know if itโs working better!
The TTS Arena, inspired by LMSys's Chatbot Arena, allows you to enter text which will be synthesized by two SOTA models. You can then vote on which model generated a better sample. The results will be published on a publicly-accessible leaderboard.
Weโve added several open access models, including Pheme, MetaVoice, XTTS, OpenVoice, & WhisperSpeech. It also includes the proprietary ElevenLabs model.
If you have any questions, suggestions, or feedback, please donโt hesitate to DM me on X (https://twitter.com/realmrfakename) or open a discussion in the Space. More details coming soon!
Try it out: TTS-AGI/TTS-Arena
Model: HuggingFaceTB/cosmo-1b
Dataset: HuggingFaceTB/cosmopedia
Hi,
How are you getting the comments? Have they previously been scraped, or are you using the Reddit API, or is this in partnership with Reddit?
Thanks!
Nice! How did you use UNA w/ Axolotl?
Congrats! So they're going to run a 11B model on a laptop? Or will it be quantized?
Amazing! Might it be possible to delete just one image, instead of having to clear all of them?
Thanks!
Congratulations! I thought HF runs on AWS, are you planning to switch to Google Cloud? Will this impact the super-fast AWS->HF upload speeds?
For model merging on low VRAM:
Here's a HF Space for easier usage:
Nice! @winglian do you know what the largest model you can fit on a single 24GB GPU (w/o LoRA/QLoRA) is?
Nice, looks really cool! Any plans to open source UNA @fblgit ?
Hello! How can I create a post on HF?