alea31415/YuriDiffusion · Hugging Face

I decide to stop creating a separate repository for each model, so most of the future models will go here. I will only create repository for more important project. Despite the name YuriDiffusion, I am not sure whether I will really train such a model. Dataset collection and the hard limit of SD both make this task very challenging.

List of models

-suremio-nozomizo-eilanya-maplesally

-onimai mahiro and mihari

-grass wonder from umamusume

Questions that I have partial answer to

Can we make an image of multiple known characters

This is possible either through native training, lora, or merging.

Can we use embedding, lora, and native training together?

Interestingly, independent trained lora and embedding, or indpendent fine-tuning and embedding go hand in hand. On the other hand, putting lora with a find-tuned model for that character does not seem to give good result.

Questions that I do not have answer to

Native training or LoRA?

I tried both for suremio-nozomizo-eilanya-maplesally and grass wonder from umamusume so you can compare them. Clearly LoRA has certain advantages

Faster to train
Lower vram requirement
Smaller size

Moreover, with the same number of steps LoRA seems to lead to better fidelity if trained with a large learning rate (1e-4). Nonetheless, this can also be a sort of overfitting, and it is unclear whether we are just trading fidelity for flexibility here. In fact, I observe applying lora can have a significant impact on the style of the base model. The problem is whether we can find a better trade-off with a smaller learning rate.

Another advantage that is not inherent to the method is the fact that we can now use LoRA directly with any network. This should be possible for normal model as well through add difference merging, but unfortunately the current interface that does not support on-the-fly merging.

Clip skip 1 or 2?

I played with this in my EuphiAni model but I cannot really judge whether training with clip skip 1 or 2 is better. Changing the prompt can always make things more favorable to one model than another. More surprisingly, I observe that even for model trained with clip skip 2, it may still be better to do inference with clip skip 1. This is not so much the case for LoRA, which can be explained by the fact that LoRA quickly bias the model towards the training distribution so it is important to match training and inference. As for native training, the model still retains its capacity to do inference at clip skip 1.

Learning rate?

For native training something around 1e-6 is good.
For LoRA the default 1e-4 learns the concept quite fast but there may be some overfitting. On the other hand 1e-5 seems to low. Still need more experience.