lora-training / michiru /README.md
khanon's picture
initial commit
7529c6f
# Chidori Michiru (Blue Archive)
## Usage
Use any or all of these tags to summon Koharu:
`michiru, 1girl, halo, yellow eyes, grey hair, small breasts`
Tag pruning means you don't need to prompt for much other than `michiru` to get a pretty good result -- eyes and hair are optional but can help correct occasional mistakes.
The AI really tries to do her ninja `kuji-in` but unsurprisingly fucks it up more often than not, holding up the wrong number of fingers or just generating messed up hands.
The tail is a bit iffy -- I tagged it as `tail` but prompting for that gets a cat tail half the time. Try `raccoon tail`.
For her normal outfit:
`school uniform, blue skirt, black pantyhose, floral print, black scarf, bridal gauntlets`
I tagged images with `sarashi` even when it was only partially visible under her uniform.
Unlike Koharu's LoRA I was a little more specific with clothing colors when tagging. Skirts are always tagged `blue skirt`, scarf is always `black scarf` etc. The hope was that it would make it possible to get her normal clothing in a different color, and to avoid overfitting `skirt` and `scarf` into the exact ones she normally wears. It sorta works but if you want a`red skirt` you have to emphasize it and negative prompt `blue skirt`.
Weights from 0.8 - 1.05 should work well, it's perhaps slightly overtrained. Included both epoch 3 and epoch 4, epoch 3 might actually be a bit better.
## Training
*All parameters are provided in the accompanying JSON files.*
- Trained on a curated set of 88 images repeated 10 times.
- Dataset included a mixture of SFW and NSFW.
- This dataset was smaller than Koharu's, but I maintained the same number of steps.
- Initially tagged with WD1.4, then performed heavy pruning and editing.
- Removed as many inaccurate tags as possible
- Made sure important traits were present and consitently described, and traits like `halo` were consistent with actual visibility
- Pruned redundant tags and simplified outfits so that they were always tagged with the same handful of tags
- Added camera angles and image composition hints
- Added a few facial expressions
- Different learning rate than usual.
- 5e-5 text encoder (same as Koharu's, but typically 1e-5 ~ 2e-5)
- 3e-4 UNet (typically one order of magnitude faster than text)
- Trained without VAE.