How can I fine tune this further?

#8
by conradgodfrey - opened

I'd like to fine tune this on my own spectrograms with more diverse data sets - particular interested in tuning it with more vocal music, as the results for this aren't brilliant at the moment.

I also have the same question, how can we fine tune this?

This comment has been hidden

I also have the same question, how can we fine tune this?

I'm going to guess you just have to fine-tune it like Stable Diffusion.
so this might help https://youtu.be/g9ibLuhXi1U

I also have the same question, how can we fine tune this?

I'm going to guess you just have to fine-tune it like Stable Diffusion.
so this might help https://youtu.be/g9ibLuhXi1U

Using dreambooth? It might be okay, but we of course need an extra step of making spectrograms.

Interesting! Thanks for sharing the video.
I won't lie I have zero intution for how much compute it takes to fine tune.
Looks like the guy in the video uses Colab to fine tune for an hour - guess we'll just have to try this ourselves :)
It feels like it would be quicker to fine-tune on top of the existing riffusion model, rather than fine tune stable diffusion from scratch. I don't know if they've published how long it took to fine-tune it?

I also have the same question, how can we fine tune this?

I'm going to guess you just have to fine-tune it like Stable Diffusion.
so this might help https://youtu.be/g9ibLuhXi1U

Using dreambooth? It might be okay, but we of course need an extra step of making spectrograms.

This might also work https://www.sonicvisualiser.org/

yeah I've already got a few spectrograms up my sleeve... can share some Python code that generates spectrograms from WAVs if anyone's interested?

Currently trying to train a model to score vocal performances in a singing competition that his a big historical score database, so have a big dataset of WAVs (and scores).

yeah I've already got a few spectrograms up my sleeve... can share some Python code that generates spectrograms from WAVs if anyone's interested?

Currently trying to train a model to score vocal performances in a singing competition that his a big historical score database, so have a big dataset of WAVs (and scores).

You can use librosa. There are some very good examples on kaggle.

Riffusion org

Additional fine tuning and data information has been added to the model-card. This was trained using approaches similar to hugging face examples, but fine-tuning can be achieved with very small datasets using a dreambooth approach.

sethforsgren changed discussion status to closed

Thanks for your replies and guidance. How did you made spectrograms? Because I found they're a little bit different than the output of typical audio visualization software I used before.

Sign up or log in to comment