Disty0 commited on
Commit
37f1d21
1 Parent(s): 2cf1800

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +220 -0
README.md ADDED
@@ -0,0 +1,220 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-to-image
3
+ license: other
4
+ license_name: stable-cascade-nc-community
5
+ license_link: LICENSE
6
+ ---
7
+
8
+ # SoteDiffusion Cascade
9
+
10
+ Anime finetune of Stable Cascade.
11
+ Currently is in very early state in training.
12
+ No commercial use thanks to StabilityAI.
13
+
14
+ ## Code Example
15
+
16
+ ```shell
17
+ pip install diffusers
18
+ ```
19
+
20
+ ```python
21
+ import torch
22
+ from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline
23
+
24
+ prompt = "(extremely aesthetic, best quality, newest), 1girl, solo, cat ears, looking at viewer, blush, light smile, upper body,"
25
+ negative_prompt = "very displeasing, worst quality, monochrome, sketch, blurry, fat, child,"
26
+
27
+ prior = StableCascadePriorPipeline.from_pretrained("Disty0|SoteDiffusion-Cascade_pre-alpha0", torch_dtype=torch.float16)
28
+ decoder = StableCascadeDecoderPipeline.from_pretrained("SoteDiffusion-Cascade_Decoder", torch_dtype=torch.float16)
29
+
30
+ prior.enable_model_cpu_offload()
31
+ prior_output = prior(
32
+ prompt=prompt,
33
+ height=1024,
34
+ width=1024,
35
+ negative_prompt=negative_prompt,
36
+ guidance_scale=6.0,
37
+ num_images_per_prompt=1,
38
+ num_inference_steps=30
39
+ )
40
+
41
+ decoder.enable_model_cpu_offload()
42
+ decoder_output = decoder(
43
+ image_embeddings=prior_output.image_embeddings.to(torch.float16),
44
+ prompt=prompt,
45
+ negative_prompt=negative_prompt,
46
+ guidance_scale=1.0,
47
+ output_type="pil",
48
+ num_inference_steps=10
49
+ ).images[0]
50
+ decoder_output.save("cascade.png")
51
+ ```
52
+
53
+
54
+ ## Training Status:
55
+
56
+ **GPU used for training**: 1x AMD RX 7900 XTX 24GB
57
+
58
+ | dataset name | training done | remaining |
59
+ |---|---|---|
60
+ | **newest** | 002 | 218 |
61
+ | **late** | 002 | 204 |
62
+ | **mid** | 002 | 199 |
63
+ | **early** | 002 | 053 |
64
+ | **oldest** | 002 | 014 |
65
+ | **pixiv** | 002 | 072 |
66
+ | **visual novel cg** | 002 | 068 |
67
+ | **anime wallpaper** | 002 | 011 |
68
+ | **Total** | 24 | 863 |
69
+
70
+ **Note**: chunks starts from 0 and there are 8000 images per chunk
71
+
72
+
73
+ ## Dataset:
74
+
75
+ **GPU used for captioning**: 1x Intel ARC A770 16GB
76
+ **Model used for captioning**: SmilingWolf|wd-v1-4-convnextv2-tagger-v2
77
+
78
+
79
+ | dataset name | total images | total chunk |
80
+ |---|---|---|
81
+ | **newest** | 1.75M | 221 |
82
+ | **late** | 1.65M | 207 |
83
+ | **mid** | 1.60M | 202 |
84
+ | **early** | 442K | 056 |
85
+ | **oldest** | 128K | 017 |
86
+ | **pixiv** | 594K | 075 |
87
+ | **visual novel cg** | 560K | 071 |
88
+ | **anime wallpaper** | 106K | 014 |
89
+ | **Total** | 6.860.873 | 863 |
90
+
91
+ **Note**: Smallest size is 1280x600 | 768.000 pixels
92
+
93
+
94
+ ## Tags:
95
+
96
+ ### Tag Format:
97
+
98
+ ```
99
+ aesthetic tags, quality tags, custom tags, date tags, rest of the tags
100
+ ```
101
+
102
+ ### Date:
103
+ | tag | date |
104
+ |---|---|
105
+ | **newest** | 2022 to 2024 |
106
+ | **late** | 2019 to 2021 |
107
+ | **mid** | 2015 to 2018 |
108
+ | **early** | 2011 to 2014 |
109
+ | **oldest** | 2005 to 2010 |
110
+
111
+ ### Aesthetic Tags:
112
+
113
+ **Model used**: shadowlilac/aesthetic-shadow
114
+
115
+ | score greater than | tag |
116
+ |---|---|
117
+ | **0.980** | extremely aesthetic |
118
+ | **0.900** | very aesthetic |
119
+ | **0.750** | aesthetic |
120
+ | **0.500** | slightly aesthetic |
121
+ | **0.350** | not displeasing |
122
+ | **0.250** | not aesthetic |
123
+ | **0.125** | slightly displeasing |
124
+ | **0.025** | displeasing |
125
+ | **rest of them** | very displeasing |
126
+
127
+ ### Quality Tags:
128
+
129
+ **Model used**: https://huggingface.co/hakurei/waifu-diffusion-v1-4/blob/main/models/aes-B32-v0.pth
130
+
131
+
132
+ | score greater than | tag |
133
+ |---|---|
134
+ | **0.980** | best quality |
135
+ | **0.900** | high quality |
136
+ | **0.750** | great quality |
137
+ | **0.500** | medium quality |
138
+ | **0.250** | normal quality |
139
+ | **0.125** | bad quality |
140
+ | **0.025** | low quality |
141
+ | **rest of them** | worst quality |
142
+
143
+ ## Custom Tags:
144
+
145
+ | dataset name | custom tag |
146
+ |---|---|
147
+ | **booru**: date, |
148
+ | **pixiv**: art by Display_Name, |
149
+ | **visual novel cg**: Full_VN_Name (short_3_letter_name), visual novel cg, |
150
+ | **anime wallpaper**: anime wallpaper, |
151
+
152
+ ## Training Params:
153
+
154
+ **Software used**: Kohya SD-Scripts with Stable Cascade branch
155
+ **Base model**: KBlueLeaf/Stable-Cascade-FP16-fixed
156
+
157
+ ### Command:
158
+ ```
159
+ accelerate launch --mixed_precision fp16 --num_cpu_threads_per_process 1 stable_cascade_train_stage_c.py \
160
+ --mixed_precision fp16 \
161
+ --save_precision fp16 \
162
+ --full_fp16 \
163
+ --sdpa \
164
+ --gradient_checkpointing \
165
+ --resolution "1024,1024" \
166
+ --train_batch_size 2 \
167
+ --gradient_accumulation_steps 32 \
168
+ --adaptive_loss_weight \
169
+ --learning_rate 4e-6 \
170
+ --lr_scheduler constant_with_warmup \
171
+ --lr_warmup_steps 100 \
172
+ --optimizer_type adafactor \
173
+ --optimizer_args "scale_parameter=False" "relative_step=False" "warmup_init=False" \
174
+ --max_grad_norm 0 \
175
+ --token_warmup_min 1 \
176
+ --token_warmup_step 0 \
177
+ --shuffle_caption \
178
+ --caption_dropout_rate 0 \
179
+ --caption_tag_dropout_rate 0 \
180
+ --caption_dropout_every_n_epochs 0 \
181
+ --dataset_repeats 1 \
182
+ --save_state \
183
+ --save_every_n_steps 128 \
184
+ --sample_every_n_steps 32 \
185
+ --max_token_length 225 \
186
+ --max_train_epochs 1 \
187
+ --caption_extension ".txt" \
188
+ --max_data_loader_n_workers 2 \
189
+ --persistent_data_loader_workers \
190
+ --enable_bucket \
191
+ --min_bucket_reso 256 \
192
+ --max_bucket_reso 4096 \
193
+ --bucket_reso_steps 64 \
194
+ --bucket_no_upscale \
195
+ --log_with tensorboard \
196
+ --output_name sotediffusion-sc_3b \
197
+ --train_data_dir /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0002 \
198
+ --in_json /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0002.json \
199
+ --output_dir /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-2 \
200
+ --logging_dir /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-2/logs \
201
+ --resume /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-1/sotediffusion-sc_3b-1-state \
202
+ --stage_c_checkpoint_path /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-1/sotediffusion-sc_3b-1.safetensors \
203
+ --effnet_checkpoint_path /mnt/DataSSD/AI/models/sd-cascade/effnet_encoder.safetensors \
204
+ --previewer_checkpoint_path /mnt/DataSSD/AI/models/sd-cascade/previewer.safetensors \
205
+ --sample_prompts /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-prompt.txt
206
+ ```
207
+
208
+
209
+ ## Limitations and Bias
210
+
211
+ ### Bias
212
+
213
+ - This model is intended for anime illustrations.
214
+ Realistic capabilites are not tested at all.
215
+ - Current version has bias to older anime styles.
216
+
217
+ ### Limitations
218
+ - Can fall back to realistic.
219
+ Use "anime illustration" tag to point it into the right direction.
220
+ - Far shot eyes are bad thanks to the heavy latent compression.