File size: 1,772 Bytes
7529c6f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
This is my first time doing any sort of Stable Diffusion training so I went through a lot of trial and error.  Here are my findings in case it helps anyone.

# Training
*All parameters are provided in the accompanying JSON files.*
- Trained on 138 curated images, repeated 8 times (1104 total images / 3 batchsize = 368 iterations)
  - Dataset included a mixture of SFW and NSFW 
  - I pruned most images with white backgrounds because I felt they might have been negatively impacting my results early on, but in hindsight I think that was bad training parameters instead.
- Dataset was tagged with WD1.4 interrogator.  Shuffling was disabled.
  - `mutsuki, blue archive` were added to the start of each caption.
- Two variants included; one trained at 512px max resolution, and another trained at 768px max resolution.  All other params identical.
- Trained on RTX 4090 for about 2min30sec (512px variant) and 6min30sec (768px variant)
  - I tried using higher batch sizes with the 512px variant for faster training, but the results seemed noticably worse.
  - Small batch sizes seem to work better even when you have the VRAM for 10 or 12, so I instead put the VRAM towards training a higher resolution variant.

# Usage
Mutsuki needs a few tags to be summoned reliably.  Some common tags in her dataset:
`1girl, halo, side ponytail, long hair, white hair, purple eyes, jacket, red skirt, light grin, small breasts`

You can add or ignore `mutsuki, blue archive`; while they were in her captions, they don't seem to be particularly strong for some reason.

You can use the 512px or 768px variants.  I want to say the 768px one is better, but it's hard to say definitively.  Give both a shot and post your findings.

Weight 0.80-1.05 should work well depending on model.