Apply for community grant: Academic project

#1
by soujanyaporia - opened
Deep Cognition and Language Research (DeCLaRe) Lab org
โ€ข
edited Apr 30, 2023

TANGO is a latent diffusion model (LDM) for text-to-audio (TTA) generation. TANGO can generate realistic audios including human sounds, animal sounds, natural and artificial sounds, and sound effects from textual prompts. We use the frozen instruction-tuned LLM Flan-T5 as the text encoder and train a UNet-based diffusion model for audio generation. We perform comparably to current state-of-the-art models for TTA across both objective and subjective metrics, despite training the LDM on a 63 times smaller dataset. We release our model, training, inference code, and pre-trained checkpoints for the research community.

We hereby request HF to give us GPUs to run this space.

tango.png

Meanwhile, we are also reaching out to companies to provide us with GPU resources for training Tango on larger datasets such as AudioSet so that Tango could generate a wider range of audios.

-Team Tango

hollownight changed discussion status to closed
hollownight changed discussion status to open
Deep Cognition and Language Research (DeCLaRe) Lab org

Dear HF,

Let us know the status of our application, please. Thanks.

-Team Tango

Hi @soujanyaporia , we have assigned a gpu to this space. Note that GPU Grants are provided temporarily and might be removed after some time if the usage is very low.

To learn more about GPUs in Spaces, please check out https://huggingface.co/docs/hub/spaces-gpus

Deep Cognition and Language Research (DeCLaRe) Lab org

Hi @akhaliq A zillion thanks :)

soujanyaporia changed discussion status to closed

Sign up or log in to comment