some questions about trainning

#1
by Anglestaw - opened

your work is nice. i want to know which code you used to trainning? is it kohya-ss/sd-scripts? if it is the code from kohya-ss/sd-scripts, can you tell me how should i set Trigger Word for trainning, like your "amber (genshin impact)". i will be grateful for your gracious help.

I use kohya-ss/sd-scripts. I use the dreambooth lora training style. Each of my images has a corresponding .txt file with the tags in it. The tags are the tag_string_general from danbooru metadata of the image. I then add amber (genshin impact) to the front of every .txt file and when training I use --keep-tokens=1 so that amber (genshin impact) is always kept in the training example while the other tags can get shuffled.
A popular alternative to using danbooru metadata is to use an AI based tagger, sd-scripts has a script to use wd tagger, I think its under the finetune folder.

I use kohya-ss/sd-scripts. I use the dreambooth lora training style. Each of my images has a corresponding .txt file with the tags in it. The tags are the tag_string_general from danbooru metadata of the image. I then add amber (genshin impact) to the front of every .txt file and when training I use --keep-tokens=1 so that amber (genshin impact) is always kept in the training example while the other tags can get shuffled.
A popular alternative to using danbooru metadata is to use an AI based tagger, sd-scripts has a script to use wd tagger, I think its under the finetune folder.

i trained some safetensors, there are no big troblems but wrong face. the face of charactor is odd and unnormal predicted by sd with my own lora safetensors sometimes. i away trained with 66 images and 100 time(repeat) for each image. is that too small of the repeat time of images? your safetensors are so greate, the quality of the output images is very nice and perfet both body and face. that's impressing me so much.

66 images is a decent amount for characters. I usually try to do around 1000 steps. Images * epochs * repeats / batch size = 1000, thats roughly the equation I try to solve.
Try adjusting learning rates, I sometimes have to retrain multiple times to get it looking good and learning rates is pretty impactful. The most important change is usually in the dataset, if the model isn't coming out good I would also double check if all the images are good. I rarely get the results I want one the very first try.
If faces are coming out weird even without using a LoRA then you might have "restore faces" checked, which isn't useful for anime images as that is trained on photographs.
You can also try training at 768 resolution instead of 512 (also increase your bucket resolution to something like 480,1280). Larger resolution training is much slower but it helps with capturing very small details. That said, Amber was trained on 512 and worked out fine.

66 images is a decent amount for characters. I usually try to do around 1000 steps. Images * epochs * repeats / batch size = 1000, thats roughly the equation I try to solve.
Try adjusting learning rates, I sometimes have to retrain multiple times to get it looking good and learning rates is pretty impactful. The most important change is usually in the dataset, if the model isn't coming out good I would also double check if all the images are good. I rarely get the results I want one the very first try.
If faces are coming out weird even without using a LoRA then you might have "restore faces" checked, which isn't useful for anime images as that is trained on photographs.
You can also try training at 768 resolution instead of 512 (also increase your bucket resolution to something like 480,1280). Larger resolution training is much slower but it helps with capturing very small details. That said, Amber was trained on 512 and worked out fine.

i try to trained some safetensors, the only question i found is the wrong or unclean eyes and face but person in my dataset have a nice an clean face and eyes. i found your safetensors deal with face and eyes perfectly even just using a Trigger word "AMBER (GENSHIN IMPACT)", i wonder if your learning rate is a list? like [0.0001, 0.00001,0.000001,0.0000001]? not just set learn_rate as 0.0001. as my thought, the key of trainning wonderful small part like face and eyes is small learning rate after trained a nice body and background with a bigger learning rate, is it right? i will keep going to correct my trainning process. and i will be very grateful for your kindness and gracious suggestion. if will be wonderful if you put a trainning video of lora into youtube, i am sure there will be a lot of person enjoy your video, because there are just a few video of trainning lora in youtube.

I mostly use constant learning rate scheduler so the learning rate wouldn't be changing (actually AdamW, which everyone using kohya_ss will be using, does adjust the learning rate).
I find learning rates between 2e-5 to 8e-5 to be fairly effective (default is 1e-4 and I find that too high for large dim loras). You want to avoid using high learning rates since those will quickly end up with fried outputs that do not look good. Very low learning rates aren't good either because it will take forever to learn anything.
When it learns, it learns everything in the image at the same time so there isn't anything like using different learning rates to learn different parts of the image.
If the eyes and face aren't coming out great and the base model you are using is capable of generating good faces, such as AnythingV3, then I would double check your training data. Maybe there is too much variation in the style or you don't have enough images that clearly show the face in detail.

I mostly use constant learning rate scheduler so the learning rate wouldn't be changing (actually AdamW, which everyone using kohya_ss will be using, does adjust the learning rate).
I find learning rates between 2e-5 to 8e-5 to be fairly effective (default is 1e-4 and I find that too high for large dim loras). You want to avoid using high learning rates since those will quickly end up with fried outputs that do not look good. Very low learning rates aren't good either because it will take forever to learn anything.
When it learns, it learns everything in the image at the same time so there isn't anything like using different learning rates to learn different parts of the image.
If the eyes and face aren't coming out great and the base model you are using is capable of generating good faces, such as AnythingV3, then I would double check your training data. Maybe there is too much variation in the style or you don't have enough images that clearly show the face in detail.

i have found the key raising this proble. i trained 1 epoch and set repeat as 100 for each image, and i predict a image with wrong eyes and face. then i try to train 2 epochs and set repeat as 50, and i get a predicted image with better eyes and face. the secondtime, i train 10 epochs and set repeat as 10, the eyes and face are more better. but, after all, my safetensors still can not predict a wonderfull and stable eyes and face as your safetensors in txt2img and img2img, especially in img2img. i will keep trainning for a better safetensors.

Sign up or log in to comment