Sylvain Filoni


AI & ML interests

ML for Animation

Blog posts


Posts 6

view post
I'm happy to announce that ✨ Image to Music v2 ✨ is ready for you to try and i hope you'll like it too ! 😌

This new version has been crafted with transparency in mind,
so you can understand the process of translating an image to a musical equivalent.

How does it works under the hood ? πŸ€”

First, we get a very literal caption from microsoft/kosmos-2-patch14-224; this caption is then given to a LLM Agent (currently HuggingFaceH4/zephyr-7b-beta )which task is to translate the image caption to a musical and inspirational prompt for the next step.

Once we got a nice musical text from the LLM, we can send it to the text-to-music model of your choice:
MAGNet, MusicGen, AudioLDM-2, Riffusion or Mustango

Instead of the previous version of Image to Music which used Mubert API, and could output curious and obscure combinations, we only provide open sourced models available on the hub, called via the gradio API.

Also i guess the music result should be more accurate to the atmosphere of the image input, thanks to the LLM Agent step.

Pro tip, you can adjust the inspirational prompt to match your expectations, according to the chosen model and specific behavior of each one πŸ‘Œ

Try it, explore different models and tell me which one is your favorite πŸ€—
β€”β€Ί fffiloni/image-to-music-v2
view post
InstantID-2V is out ! ✨

It's like InstantID, but you get a video instead. Nothing crazy here, it's simply a shortcut between two demos.

Let's see how it does work with gradio API:

1. We call InstantX/InstantID with a conditional pose from cinematic camera shot (example provided in the demo)
2. Then we send the previous generated image to ali-vilab/i2vgen-xl

Et voilΓ  πŸ€— Try it : fffiloni/InstantID-2V

Note that generation can be quite long, so take the opportunity to brew you some coffee 😌
If you want to skip the queue, you can of course reproduce this pipeline manually