Apply for community grant: Academic project (gpu)

#1
by Plachta - opened

Hi Dear HF Team! ๐Ÿ˜ƒ

This is an open-source implementation of Microsoft's latest Text-to-speech model VALL-E X ๐ŸŽ™๏ธ, from paper Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling ๐ŸŒ. It is basically a 24 layers, 1024 d GPT-style model ๐Ÿ’ป. I have made a demo page about this model for your inspection and consideration ๐Ÿ“‹.

It takes about 60s to synthesize a 6s speech on free CPU โณ, but only 2~3s on a single RTX 3060 โšก. I sincerely hope that you could grant GPU resources for the Hugging Face space of this project, so that more people can have the chance to play with this awesome model ๐Ÿš€.

Best Regards!๐Ÿค—

Hello, great author:
I encountered a problem when I used the downloaded pre-training model to generate directly according to the basic usage method in the explanatory text: I generated blank noise audio. The same is true for audio generated by running python -X utf8 launch-ui.py into the user interface. However, the results generated by opening the online demonstration link you provided are normal, and I don't understand what went wrong. Vallex_checkpoint.pt and vocos models have been downloaded.

Sign up or log in to comment