Recommended caption style for fine-tuning

by wnmurphy - opened 2 days ago

Outstanding work.

Interested in fine-tuning, and I have a question about how to structure my inputs (captions). Were the audio models trained on input that looks like the automatically-generated captions?

I'm wondering if I'll have better results sticking to that format (i.e. BPM and key separated out from the caption rather than in it) or by generating captions that contain all of the information like BPM and key + much more detail than the generated captions provide.

Also, I'll look for it more, but could you point me to where the prompt for the LM's caption generation lives in the repo?

Much appreciated.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment