Thoughts on LoRA Training #1

Community Article Published June 18, 2024

I talk to many people about training LoRAs, from a variety of backgrounds. Some are very new to it, while others are well-established with impressive model portfolios. I aim to make this a series of posts, and possibly an article, discussing my thoughts on LoRA training and my suggestions.

For my part, I would consider myself a curative and artistically driven finetuner. The technical specs that concern me the most are the ones that start with the dataset, how the concepts in the dataset relate to each other and the text captions, and how that functions within the context of the model being finetuned. So, my focus on the technical specs of the parameters is very utilitarian - I want parameters that work well enough that I can leverage the dataset and visual information to achieve strong results. As evidenced by me elementary use of syntax in this article.

One of the biggest issues I often observe is people trying to do too much all at once. By this, I mean coming into a training situation without any previous fine-tuning or art experience, and attempting to adjust all the parameters and tweak settings simultaneously.

I have seen different fine-tuning parameters achieve equally impressive results. The difference often lies more in the quality of the dataset and the accompanying text captions, and how the decisions you make about these two elements relate to the parameters you are using.

Here are some of my rules of thumb, which I have used on various training setups with generally good results:

I use 20-30 images for style training and 10-20 images for a character.
I use a mix of captions: about 1/3 in a narrative sentence structure, 1/3 as a long list of attributes seen in the related image, and 1/3 as a single word. I have done this both with and without a unique token, but I prefer including a unique token in case I want to add extra weight.
For datasets that are primarily AI-generated, I find it takes less time to train.
Illustrated datasets that are unique or minimalistic take more time to train, regardless of whether they are AI-generated.
Handmade/human-origin datasets take longer to train.
For SDXL, some of the attributes seen in overfitting also appear in underfitting (loss of fidelity, linework degradation, etc.). However, I find it easier to confirm underfitting by shortening my training and rerunning it. If the style breaks and becomes more generic, it is underfit. If it resolves, it was overfit.

These are my general thoughts. I am not the best at planning out this kind of content for myself—ironically, as most of my work involves creating similar content for companies—but I will keep trying to expand on this with examples.

You can find a follow up on where to train here: https://huggingface.co/blog/alvdansen/thoughts-on-lora-training-pt-2-training-services

Upvote