How can I fine tune this further?

by conradgodfrey - opened Dec 18, 2022

Dec 18, 2022

I'd like to fine tune this on my own spectrograms with more diverse data sets - particular interested in tuning it with more vocal music, as the results for this aren't brilliant at the moment.

Muhammadreza

Dec 19, 2022

I also have the same question, how can we fine tune this?

breadlicker45

Dec 19, 2022

This comment has been hidden

breadlicker45

Dec 19, 2022

•

edited Dec 19, 2022

I also have the same question, how can we fine tune this?

I'm going to guess you just have to fine-tune it like Stable Diffusion.
so this might help https://youtu.be/g9ibLuhXi1U

Muhammadreza

Dec 20, 2022

I also have the same question, how can we fine tune this?

I'm going to guess you just have to fine-tune it like Stable Diffusion.
so this might help https://youtu.be/g9ibLuhXi1U

Using dreambooth? It might be okay, but we of course need an extra step of making spectrograms.

conradgodfrey

Dec 20, 2022

Interesting! Thanks for sharing the video.
I won't lie I have zero intution for how much compute it takes to fine tune.
Looks like the guy in the video uses Colab to fine tune for an hour - guess we'll just have to try this ourselves :)
It feels like it would be quicker to fine-tune on top of the existing riffusion model, rather than fine tune stable diffusion from scratch. I don't know if they've published how long it took to fine-tune it?

breadlicker45

Dec 20, 2022

I also have the same question, how can we fine tune this?

I'm going to guess you just have to fine-tune it like Stable Diffusion.
so this might help https://youtu.be/g9ibLuhXi1U

Using dreambooth? It might be okay, but we of course need an extra step of making spectrograms.

This might also work https://www.sonicvisualiser.org/

conradgodfrey

Dec 20, 2022

yeah I've already got a few spectrograms up my sleeve... can share some Python code that generates spectrograms from WAVs if anyone's interested?

Currently trying to train a model to score vocal performances in a singing competition that his a big historical score database, so have a big dataset of WAVs (and scores).

Muhammadreza

Dec 20, 2022

yeah I've already got a few spectrograms up my sleeve... can share some Python code that generates spectrograms from WAVs if anyone's interested?

Currently trying to train a model to score vocal performances in a singing competition that his a big historical score database, so have a big dataset of WAVs (and scores).

You can use librosa. There are some very good examples on kaggle.

sethforsgren

Riffusion org Dec 21, 2022

Additional fine tuning and data information has been added to the model-card. This was trained using approaches similar to hugging face examples, but fine-tuning can be achieved with very small datasets using a dreambooth approach.

sethforsgren changed discussion status to closed Dec 21, 2022

Muhammadreza

Dec 25, 2022

Thanks for your replies and guidance. How did you made spectrograms? Because I found they're a little bit different than the output of typical audio visualization software I used before.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment