Recommended caption style for fine-tuning

#2
by wnmurphy - opened

Outstanding work.

Interested in fine-tuning, and I have a question about how to structure my inputs (captions). Were the audio models trained on input that looks like the automatically-generated captions?

I'm wondering if I'll have better results sticking to that format (i.e. BPM and key separated out from the caption rather than in it) or by generating captions that contain all of the information like BPM and key + much more detail than the generated captions provide.

Also, I'll look for it more, but could you point me to where the prompt for the LM's caption generation lives in the repo?

Much appreciated.

Sign up or log in to comment