File size: 5,156 Bytes
ccfb293 065c9d5 a8d203d 065c9d5 ba96265 065c9d5 b80afa1 af6512e a8d203d af6512e a74c748 a8d203d bafcc48 a8d203d a6be6cb a8d203d a6be6cb a8d203d bafcc48 a8d203d af6512e bafcc48 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
---
license: creativeml-openrail-m
---
### Trigger words
```
Anisphia, Euphyllia, Tilty, OyamaMahiro, OyamaMihari
by onono imoko, by momoko, by mochizuki kei, by kantoku, by ke-ta
aniscreen, fanart
```
For `0324_all_aniscreen_tags`, I accidentally tag all the character images with `aniscreen`.
For `0325_aniscreen_fanart_styles`, things are done correctly (anime screenshots tagged as `aniscreen`, fanart tagged as `fanart`).
### Settings
Default settings are
- loha net dim 8, conv dim 4, alpha 1
- lr 2e-4 constant scheduler throuout
- Adam8bit
- resolution 512
- clip skip 1
Names of the files suggest how the setting is changed with respect to this default setup.
The configuration json files can otherwsie be found in the `config` subdirectories that lies in each folder.
However, some experiments concern the effect of tags for which I regenerate the txt file and the difference can not be seen from the configuration file in this case.
For now this concerns `05tag` for which tags are only used with probability 0.5.
### Some observations
For a thorough comparaison please refer to the `generated_samples` folder.
#### Captioning
Dataset, in general, is the most important out of all.
The common wisdom that we should prune anything that we want to be attach to the trigger word is exactly the way to go for.
No tags at all (top three rows) is terrible, especially for style training.
Having all the tags (bottom three rows) remove the traits from subjects if these tags are not used during sampling (not completely true but more or less the case).
![00066-20230326090858](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00066-20230326090858.png)
#### Training Resolution
The most prominent benefit of training at higher resolution is that it helps generating more complex/detailed background.
Chances are that you can get more details about the outfit or pupils etc.
However, training at higher-resolution is quite time-consuming and most of the time it is probably not worth it.
For example, if you want better background it can be simpler to switch the model (unless, say, you are actually training background lora).
![00045-20230326045748](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00045-20230326045748.png)
![00044-20230326044731](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00044-20230326044731.png)
#### Text encoder learning rate
It is often suggested to set the text encoder learning rate to be smaller than that of unet.
This of course causes training to be slower white it is hard to evaluate the benefit.
In one experiment I half the text encoder learning rate and train the model two times longer.
After spending some time here are two situations that reveal the potential benefit of this practice.
- In my training set I have anime screenshots, tagged with `aniscreen` and fanarts, taggedd with `fanart`.
Although they are balanced to have the same weight, the consistency of anime screenshots seems to drive the characters toward this style by default.
When I put `aniscreen` to negative, this causes bad results in general but the one trained with lower text encoder learning rate seems to survive the best.
Note that Tilty (second image) is only trained with anime screenshots.
![00165-20230327023658](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00165-20230327023658.png)
![00085-20230327030828](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00085-20230327030828.png)
- Training at lower text encoder rate should better preserve the model's ability to understand the prompt.
This aspect is difficult to test, but it seems to be confirmed by this "umbrella" experiment (though some other setup, such as lora and higher dimension seem to give even better results).
![00083-20230327015201](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00083-20230327015201.png)
In any case, I still believe if we want to get the best result we should avoid compeletely text encodedr training and do [pivotal tuning](https://github.com/cloneofsimo/lora/discussions/121) instead.
#### Others
1. I barely see any difference for training at clip skip 1 and 2.
3. The difference between lora, locon, and loha are very subtle.
### Datasets
Here is the composition of the datasets
```
17_characters~fanart~OyamaMihari: 53
19_characters~fanart~OyamaMahiro+OyamaMihari: 47
1_artists~kantoku: 2190
24_characters~fanart~Anisphia: 37
28_characters~screenshots~Anisphia+Tilty: 24
2_artists~ke-ta: 738
2_artists~momoko: 762
2_characters~screenshots~Euphyllia: 235
3_characters~fanart~OyamaMahiro: 299
3_characters~screenshots~Anisphia: 217
3_characters~screenshots~OyamaMahiro: 210
3_characters~screenshots~OyamaMahiro+OyamaMihari: 199
3_characters~screenshots~OyamaMihari: 177
4_characters~screenshots~Anisphia+Euphyllia: 165
57_characters~fanart~Euphyllia: 16
5_artists~mochizuki_kei: 426
5_artists~onono_imoko: 373
7_characters~screenshots~Tilty: 95
9_characters~fanart~Anisphia+Euphyllia: 97
``` |