Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
fffiloni 
posted an update Feb 2
Post
I'm happy to announce that ✨ Image to Music v2 ✨ is ready for you to try and i hope you'll like it too ! 😌

This new version has been crafted with transparency in mind,
so you can understand the process of translating an image to a musical equivalent.

How does it works under the hood ? 🤔

First, we get a very literal caption from microsoft/kosmos-2-patch14-224; this caption is then given to a LLM Agent (currently HuggingFaceH4/zephyr-7b-beta )which task is to translate the image caption to a musical and inspirational prompt for the next step.

Once we got a nice musical text from the LLM, we can send it to the text-to-music model of your choice:
MAGNet, MusicGen, AudioLDM-2, Riffusion or Mustango

Instead of the previous version of Image to Music which used Mubert API, and could output curious and obscure combinations, we only provide open sourced models available on the hub, called via the gradio API.

Also i guess the music result should be more accurate to the atmosphere of the image input, thanks to the LLM Agent step.

Pro tip, you can adjust the inspirational prompt to match your expectations, according to the chosen model and specific behavior of each one 👌

Try it, explore different models and tell me which one is your favorite 🤗
—› fffiloni/image-to-music-v2

✨ Tried with this portrait of mine, there is something other-worldly about it... https://x.com/mvaloatto/status/1754404850664616240

·

Wow it sounds quite epic to me!

This is so cool! I can't stop playing with it!

This is pretty neat
image.png

hey, your previous version "Image-to-MusicGen" is or was much much better, with the use of "CLIP-Interrogator-2". 30 seconds was ok, could be longer, but it was perfect! i made very good music from my images and can you please fix it!? and again, if the track could be a bit longer than 30 seconds it would be crazy good!
thank you!