Disty0 commited on
Commit
e7daf3e
1 Parent(s): dcbacef

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +286 -2
README.md CHANGED
@@ -1,8 +1,292 @@
1
  ---
2
  pipeline_tag: text-to-image
3
  license: other
4
- license_name: stable-cascade-nc-community
5
  license_link: LICENSE
6
  decoder:
7
  - Disty0/sotediffusion-wuerstchen3-alpha1-decoder
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  pipeline_tag: text-to-image
3
  license: other
4
+ license_name: faipl-1.0-sd
5
  license_link: LICENSE
6
  decoder:
7
  - Disty0/sotediffusion-wuerstchen3-alpha1-decoder
8
+ ---
9
+
10
+
11
+ # SoteDiffusion Cascade
12
+
13
+ Anime finetune of Würstchen V3.
14
+ Currently is in very early state in training.
15
+ No commercial use thanks to StabilityAI.
16
+
17
+ # Release Notes
18
+
19
+ Did major cleanup on the dataset in this release.
20
+ Changed the training parameters and started from a fresh state.
21
+ Switch to FairAI license. (Still no commercial use.)
22
+
23
+
24
+ <table>
25
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6456af6195082f722d178522/oKTevlG-qi5Jfdy6TkGeI.png" height="576">
26
+ </table>
27
+
28
+
29
+ # UI Guide
30
+
31
+ ## SD.Next
32
+ Switch to the dev branch:
33
+ ```
34
+ git checkout dev
35
+ ```
36
+ Go to Models -> Huggingface and type `Disty0/sotediffusion-wuerstchen3-alpha1-decoder` into the model name and press download.
37
+ Load `Disty0/sotediffusion-wuerstchen3-alpha1-decoder` after the download process is complete.
38
+
39
+ Parameters:
40
+ Sampler: Default
41
+
42
+ Steps: 30 or 40
43
+ Secondary Steps: 10
44
+
45
+ CFG: 8
46
+ Secondary CFG: 1 or 1.2
47
+
48
+ ## ComfyUI
49
+ Please refer to CivitAI: https://civitai.com/models/353284
50
+
51
+
52
+ # Code Example
53
+
54
+ ```shell
55
+ pip install diffusers
56
+ ```
57
+
58
+ ```python
59
+ import torch
60
+ from diffusers import AutoPipelineForText2Image
61
+
62
+ device = "cuda"
63
+ dtype = torch.bfloat16
64
+ model = "Disty0/sotediffusion-wuerstchen3-alpha1-decoder"
65
+
66
+ pipe = AutoPipelineForText2Image.from_pretrained(model, torch_dtype=dtype)
67
+
68
+ # send everything to the gpu:
69
+ pipe = pipe.to(device, dtype=dtype)
70
+ pipe.prior_pipe = pipe.prior_pipe.to(device, dtype=dtype)
71
+
72
+ # or enable model offload to save vram:
73
+ # pipe.enable_model_cpu_offload()
74
+
75
+
76
+
77
+ prompt = "extremely aesthetic, best quality, newest, general, 1girl, solo, looking at viewer, blush, slight smile, cat ears, long hair, dress, bare shoulders, cherry blossoms, flowers, petals, vegetation, wind,"
78
+ negative_prompt = "very displeasing, worst quality, oldest, monochrome, sketch, loli, child,"
79
+
80
+ output = pipe(
81
+ width=1024,
82
+ height=1536,
83
+ prompt=prompt,
84
+ negative_prompt=negative_prompt,
85
+ decoder_guidance_scale=1.0,
86
+ prior_guidance_scale=8.0,
87
+ prior_num_inference_steps=40,
88
+ output_type="pil",
89
+ num_inference_steps=10
90
+ ).images[0]
91
+
92
+ ## do something with the output image
93
+ ```
94
+
95
+
96
+ ## Training Status:
97
+
98
+ **Alpha0 Release**: This release resets the training and enables Text Encoder training.
99
+
100
+
101
+ **GPU used for training**: 1x AMD RX 7900 XTX 24GB
102
+
103
+ | dataset name | training done | remaining |
104
+ |---|---|---|
105
+ | **newest** | 003 | 228 |
106
+ | **recent** | 003 | 169 |
107
+ | **mid** | 003 | 121 |
108
+ | **early** | 003 | 067 |
109
+ | **oldest** | 003 | 017 |
110
+ | **pixiv** | 003 | 039 |
111
+ | **visual novel cg** | 003 | 025 |
112
+ | **anime wallpaper** | 003 | 010 |
113
+ | **Total** | 32 | 682 |
114
+
115
+ **Note**: chunks starts from 0 and there are 8000 images per chunk
116
+
117
+
118
+ ## Dataset:
119
+
120
+ **GPU used for captioning**: 1x Intel ARC A770 16GB
121
+ **Model used for captioning**: SmilingWolf/wd-swinv2-tagger-v3
122
+ **Command:**
123
+ ```
124
+ python /mnt/DataSSD/AI/Apps/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py --model_dir "/mnt/DataSSD/AI/models/wd14_tagger_model" --repo_id "SmilingWolf/wd-swinv2-tagger-v3" --recursive --remove_underscore --use_rating_tags --character_tags_first --character_tag_expand --append_tags --onnx --caption_separator ", " --general_threshold 0.35 --character_threshold 0.50 --batch_size 4 --caption_extension ".txt" ./
125
+ ```
126
+
127
+
128
+ | dataset name | total images | total chunk |
129
+ |---|---|---|
130
+ | **newest** | 1.848.331 | 232 |
131
+ | **recent** | 1.380.630 | 173 |
132
+ | **mid** | 993.227 | 125 |
133
+ | **early** | 566.152 | 071 |
134
+ | **oldest** | 160.397 | 021 |
135
+ | **pixiv** | 343.614 | 043 |
136
+ | **visual novel cg** | 231.358 | 029 |
137
+ | **anime wallpaper** | 104.790 | 014 |
138
+ | **Total** | 5.628.499 | 708 |
139
+
140
+ **Note**:
141
+ - Smallest size is 1280x600 | 768.000 pixels
142
+ - Deduped based on image similarity using czkawka-zli
143
+
144
+
145
+ ## Tags:
146
+
147
+ ```
148
+ aesthetic tags, quality tags, date tags, custom tags, rating tags, character tags, rest of the tags
149
+ ```
150
+
151
+ ### Date:
152
+ | tag | date |
153
+ |---|---|
154
+ | **newest** | 2022 to 2024 |
155
+ | **recent** | 2019 to 2021 |
156
+ | **mid** | 2015 to 2018 |
157
+ | **early** | 2011 to 2014 |
158
+ | **oldest** | 2005 to 2010 |
159
+
160
+ ### Aesthetic Tags:
161
+
162
+ **Model used**: shadowlilac/aesthetic-shadow-v2
163
+
164
+ | score greater than | tag |
165
+ |---|---|
166
+ | **0.90** | extremely aesthetic |
167
+ | **0.80** | very aesthetic |
168
+ | **0.70** | aesthetic |
169
+ | **0.50** | slightly aesthetic |
170
+ | **0.40** | not displeasing |
171
+ | **0.30** | not aesthetic |
172
+ | **0.20** | slightly displeasing |
173
+ | **0.10** | displeasing |
174
+ | **rest of them** | very displeasing |
175
+
176
+ ### Quality Tags:
177
+
178
+ **Model used**: https://huggingface.co/hakurei/waifu-diffusion-v1-4/blob/main/models/aes-B32-v0.pth
179
+
180
+
181
+ | score greater than | tag |
182
+ |---|---|
183
+ | **0.980** | best quality |
184
+ | **0.900** | high quality |
185
+ | **0.750** | great quality |
186
+ | **0.500** | medium quality |
187
+ | **0.250** | normal quality |
188
+ | **0.125** | bad quality |
189
+ | **0.025** | low quality |
190
+ | **rest of them** | worst quality |
191
+
192
+ ## Rating Tags
193
+ - general
194
+ - sensitive
195
+ - nsfw
196
+ - explicit nsfw
197
+
198
+ ## Custom Tags:
199
+
200
+ | dataset name | custom tag |
201
+ |---|---|
202
+ | **image boards** | date, |
203
+ | **pixiv** | art by Display_Name, |
204
+ | **visual novel cg** | Full_VN_Name (short_3_letter_name), visual novel cg, |
205
+ | **anime wallpaper** | date, anime wallpaper, |
206
+
207
+ ## Training Params:
208
+
209
+ **Software used**: Kohya SD-Scripts with Stable Cascade branch
210
+ **Base model**: Disty0/sote-diffusion-cascade-alpha0
211
+
212
+ ### Command:
213
+ ```shell
214
+ LD_PRELOAD=/usr/lib/libtcmalloc.so.4 accelerate launch --mixed_precision fp16 --num_cpu_threads_per_process 1 stable_cascade_train_stage_c.py \
215
+ --mixed_precision fp16 \
216
+ --save_precision fp16 \
217
+ --full_fp16 \
218
+ --sdpa \
219
+ --gradient_checkpointing \
220
+ --train_text_encoder \
221
+ --resolution "1024,1024" \
222
+ --train_batch_size 2 \
223
+ --gradient_accumulation_steps 8 \
224
+ --learning_rate 1e-5 \
225
+ --learning_rate_te1 1e-5 \
226
+ --lr_scheduler constant_with_warmup \
227
+ --lr_warmup_steps 100 \
228
+ --optimizer_type adafactor \
229
+ --optimizer_args "scale_parameter=False" "relative_step=False" "warmup_init=False" \
230
+ --max_grad_norm 0 \
231
+ --token_warmup_min 1 \
232
+ --token_warmup_step 0 \
233
+ --shuffle_caption \
234
+ --caption_separator ", " \
235
+ --caption_dropout_rate 0 \
236
+ --caption_tag_dropout_rate 0 \
237
+ --caption_dropout_every_n_epochs 0 \
238
+ --dataset_repeats 1 \
239
+ --save_state \
240
+ --save_every_n_steps 256 \
241
+ --sample_every_n_steps 64 \
242
+ --max_token_length 225 \
243
+ --max_train_epochs 1 \
244
+ --caption_extension ".txt" \
245
+ --max_data_loader_n_workers 2 \
246
+ --persistent_data_loader_workers \
247
+ --enable_bucket \
248
+ --min_bucket_reso 256 \
249
+ --max_bucket_reso 4096 \
250
+ --bucket_reso_steps 64 \
251
+ --bucket_no_upscale \
252
+ --log_with tensorboard \
253
+ --output_name sotediffusion-wr3_3b \
254
+ --train_data_dir /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0004/0005 \
255
+ --in_json /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0004/0005.json \
256
+ --output_dir /mnt/DataSSD/AI/SoteDiffusion/Wuerstchen3/sotediffusion-wr3_3b-4/0005 \
257
+ --logging_dir /mnt/DataSSD/AI/SoteDiffusion/Wuerstchen3/sotediffusion-wr3_3b-4/0005/logs \
258
+ --resume /mnt/DataSSD/AI/SoteDiffusion/Wuerstchen3/sotediffusion-wr3_3b-4/0004/sotediffusion-wr3_3b-state \
259
+ --stage_c_checkpoint_path /mnt/DataSSD/AI/SoteDiffusion/Wuerstchen3/sotediffusion-wr3_3b-4/0004/sotediffusion-wr3_3b.safetensors \
260
+ --text_model_checkpoint_path /mnt/DataSSD/AI/SoteDiffusion/Wuerstchen3/sotediffusion-wr3_3b-4/0004/sotediffusion-wr3_3b_text_model.safetensors \
261
+ --effnet_checkpoint_path /mnt/DataSSD/AI/models/wuerstchen3/effnet_encoder.safetensors \
262
+ --previewer_checkpoint_path /mnt/DataSSD/AI/models/wuerstchen3/previewer.safetensors \
263
+ --sample_prompts /mnt/DataSSD/AI/SoteDiffusion/Wuerstchen3/config/sotediffusion-prompt.txt
264
+ ```
265
+
266
+
267
+ ## Limitations and Bias
268
+
269
+ ### Bias
270
+
271
+ - This model is intended for anime illustrations.
272
+ Realistic capabilites are not tested at all.
273
+ - Still underbaked.
274
+
275
+ ### Limitations
276
+ - Can fall back to realistic.
277
+ Add "realistic" tag to the negatives when this happens.
278
+ - Far shot eyes are can bad.
279
+ - Anatomy and hands can bad.
280
+
281
+
282
+ ## License
283
+ (This part is copied directly from Animagine V3.1 and modified.)
284
+
285
+ SoteDiffusion models falls under [Fair AI Public License 1.0-SD](https://freedevproject.org/faipl-1.0-sd/) license, which is compatible with Stable Diffusion models’ license. Key points:
286
+
287
+ 1. **Modification Sharing:** If you modify SoteDiffusion models, you must share both your changes and the original license.
288
+ 2. **Source Code Accessibility:** If your modified version is network-accessible, provide a way (like a download link) for others to get the source code. This applies to derived models too.
289
+ 3. **Distribution Terms:** Any distribution must be under this license or another with similar rules.
290
+ 4. **Compliance:** Non-compliance must be fixed within 30 days to avoid license termination, emphasizing transparency and adherence to open-source values.
291
+
292
+ **Notes**: Anything not covered by Fair AI license is inherited from Stability AI Non-Commercial license which is named as LICENSE_INHERIT. Meaning, still no commercial use of any kind.