Apply for community grant: Academic project

#1
by lauraibnz - opened

MIDI-AudioLDM is a MIDI-conditioned text-to-audio model based on the open-source project AudioLDM. The model has been conditioned using the ControlNet architecture and has been developed within Hugging Face’s Diffusers framework. Once trained, MIDI-AudioLDM accepts a MIDI file and a text prompt as input and returns an audio file, which is an interpretation of the MIDI based on the given text description. This enables detailed control over different musical aspects such as notes, mood and timbre.

The project is being developed as part of a Master's Thesis in Artificial Intelligence Research at UNED and will be presented within the Project Area at Sónar+D 2023. It is an ongoing research and the checkpoint will soon be replaced by a more stable version of the model.

very cool @lauraibnz , could you please add some examples?

This is an example from the test set. The following would be the piano roll extracted from the MIDI file:

test.png

This would be a generated audio using this piano roll as conditioning with the prompt "piano":

And the same but with the prompt "violin":

very cool! I meant also to add examples to your interface
https://gradio.app/docs/#interface:~:text=will%20be%20displayed.-,examples,-list%5BAny%5D%20%7C%20list you can also cache_examples =True

btw, what kind of prompts would work here? do you have a dataset of examples?

I will add more examples soon as I might replace the model with an updated version the following days. This model uses cvssp/audioldm-m-full as a starting point and has been trained and fine-tuned with audio embeddings rather than prompts. At inference it does use prompt embeddings and accepts any text as input, just as AudioLDM does. However, the fine-tuning has been carried out using a subset of datasets Slakh and URMP, so more musical input works best.

I added some examples and a short description for each parameter.

It would be very useful for me if a GPU was assigned to this space, as I will be showing it as a demo of my academic project at Sónar+D during the following days. I will continue to update the space and checkpoint as well as submitting a PR to the diffusers code once I make some final changes. Thank you!

hi @lauraibnz , we assigned a T4 with 10 hours sleeping time, if no requests are made the Space go to sleep, and if you need it again you need to restart.
Please share about Hugging Face and the grant on the Sónar+D event!! 🙏 we'd appreciate, also on social media, thanks
ps. Sónar+D is a great event! congrats and good luck

ps, go to your Space settings, select the T4 small and assign as community

image.png

Hi @radames , thanks a lot! I will share about HuggingFace and the grant :) In my settings I already see it as current - free grant, I guess that's already it? Thanks again

yes It's active!

radames changed discussion status to closed

Hi @radames , I have been writing my Master's thesis about this project during the last months, so I haven't been using the space that often. However, the 23rd of September I will be giving a talk/workshop about it at Volumens Festival in Spain, and the following week of the 25th I will be presenting the thesis at my university. For those weeks it would be very nice to go back to a longer sleep time period in case it is possible.

Thanks a lot! Best,

Laura

lauraibnz changed discussion status to open

Hi @radames , let me know whenever you can if this would be possible. Thanks again!

Best,

Laura

hi @lauraibnz sorry for the delay, I don't think I can change the sleep time, But you can always restart the Space and everytime you interact with it, it resets the timer, or if other people are interacting with it, it won't sleep.

Ok @radames ! Thank you anyway

lauraibnz changed discussion status to closed

Sign up or log in to comment