Methodologies and Dataset
The results are so funny. I wanted to know how you trained the SDXL model on the custom dataset. I'm a beginner in training LLMs and want to learn. Could you please share some resources since I'm planning to build a project similar to this?
I had a hard time figuring out how to do this as the instructions are pretty scarce or incomplete. I'm away from my computer that I used to train the model, but let me share a screenshot of my directory structure tonight along with some tips. The directory structure is almost the key thing along with the captioning in the directory containing the files. Once you have that set up, you can use kohya for the training. It's pretty straighforward. I used a local GPU. My original intent was to play with small LLMs and I've been sidetracked with images!!!
Thank you so much. It would be really helpful. Did you create the dataset from scratch or was the dataset available?
I trained from images posted to the net. It's doesn't take a huge number of images. I'll try to do a screenshot of my directory structure and I think that will show how many images I used.
Oh okay, thank you!
Here are some further details. I hope this helps. I used kohya to train. Specifically the Lora tab and not the Dreambooth tab. I have uploaded my kohya settings file so you can look at that. You can load it into Kohya and it should give you something to start with.
Here is my directory structure. The number in front of the one directory is important. You can play around with that to see what works best for you. I've used different numbers there. 2 or 3 or... Some people set this based on some calculations (like 1500 divided by the number of training images) but I don't know that it always works that way. Looks like I had 243 images to train this model but you might not need nearly so many depending on what you are training.
I used WD14 captioning on this dataset. You can go through the files and make sure the captions make sense. I think I was hurrying on this one and didn't do that... and the results were good enough so I called it good. In the captioning settings, you can specify a prefix and I used the following (I'm using a completely weird trigger word so that SD can learn a new class and not rely on what it thinks it knows about an existing class that it has already been trained on): m0nstrcrz, caricature,
Run it and see what you get. If you see things repeatedly that you don't want, you can also specify things you don't want to see in the captions. There is a windows program that allows you to scroll through your captions, but I am on linux so I usually just use command line to fiddle with things. You can also use Blip Captioning if you want more English style captions. I have been using more of those lately but I'm not sure what is best. Here is a sample of a WD14 caption for this model:
When you run the training, you can also have it generate sample images periodically, e.g., after each epoch or after X number of steps. I usually do it after every one or two epochs depending on how long they take. This shows me if I'm getting something reasonable. Under the Lora tab/ Parameters/ Samples you can specific that, and give some prompts to generate images.
My settings might be running the training for a lot of epochs. It depends on the model, but too many can create a model that causes SD to become really bad at a lot of things. So, it's a balancing act. Just enough training to get it to learn your concept well enough (and play well with other models too) without destroying the rest of stable diffusion.
Lots of information to process to๐ . I'll start with this and will let you know if I have any further questions. Thank you so much.