Apply for community grant: Personal project

by kevinwang676 - opened

Combine the powerful text-to-audio model Bark with real-time voice cloning, which can generate highly realistic audio in a custom voice uploaded by the users.

Hi, we tried this but the processing on the cloning part seems to be way too fast, and although it changes the voice timbre and quality a bit, it sounds absolutely nothing like the reference voice. Is it only supposed to replicate voice tone and timbre, or accent and other aspects of the voice as well? If the latter, it doesn't seem to be working at all for us.

Hi, the voice cloning part requires you to upload longer audio (~90s) as the reference audio in order to impove the quality of the cloned speech. You can check out the demo of YourTTS here: Thanks for reaching out!

looks like a good idea; havent tested it too well, but since bark is a pain to deal when finetuning; could be a good project to give a gpu to!

Thanks for your comments! I've added an example of voice cloning to the space. Please check it out. It would be amazing if this space can be used to demo Bark with voice cloning and for people to try it.

CPU inference not only for bark but for cloning takes very long (400+ seconds for a 3 second output)

Sorry for the inconvenience. That's also the reason why I'd like to apply for community grant๐Ÿ˜‚However, you can always duplicate and use it with a GPU in your own space.

