Disty0
/

sotediffusion-wuerstchen3-alpha1

Text-to-Image

Diffusers

Safetensors

StableCascadePriorPipeline

Model card Files Files and versions Community

Disty0 commited on Apr 25

Commit

e7daf3e

•

1 Parent(s): dcbacef

Update README.md

Browse files

Files changed (1) hide show

README.md +286 -2

README.md CHANGED Viewed

@@ -1,8 +1,292 @@
 ---
 pipeline_tag: text-to-image
 license: other
-license_name: stable-cascade-nc-community
 license_link: LICENSE
 decoder:
 - Disty0/sotediffusion-wuerstchen3-alpha1-decoder
----

 ---
 pipeline_tag: text-to-image
 license: other
+license_name: faipl-1.0-sd
 license_link: LICENSE
 decoder:
 - Disty0/sotediffusion-wuerstchen3-alpha1-decoder
+---
+# SoteDiffusion Cascade
+Anime finetune of Würstchen V3.
+Currently is in very early state in training.
+No commercial use thanks to StabilityAI.
+# Release Notes
+Did major cleanup on the dataset in this release.
+Changed the training parameters and started from a fresh state.
+Switch to FairAI license. (Still no commercial use.)
+<table>
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/6456af6195082f722d178522/oKTevlG-qi5Jfdy6TkGeI.png" height="576">
+</table>
+# UI Guide
+## SD.Next
+Switch to the dev branch:
+```
+git checkout dev
+```
+Go to Models -> Huggingface and type `Disty0/sotediffusion-wuerstchen3-alpha1-decoder` into the model name and press download.
+Load `Disty0/sotediffusion-wuerstchen3-alpha1-decoder` after the download process is complete.
+Parameters:
+Sampler: Default
+Steps: 30 or 40
+Secondary Steps: 10
+CFG: 8
+Secondary CFG: 1 or 1.2
+## ComfyUI
+Please refer to CivitAI: https://civitai.com/models/353284
+# Code Example
+```shell
+pip install diffusers
+```
+```python
+import torch
+from diffusers import AutoPipelineForText2Image
+device = "cuda"
+dtype = torch.bfloat16
+model = "Disty0/sotediffusion-wuerstchen3-alpha1-decoder"
+pipe = AutoPipelineForText2Image.from_pretrained(model, torch_dtype=dtype)
+# send everything to the gpu:
+pipe = pipe.to(device, dtype=dtype)
+pipe.prior_pipe = pipe.prior_pipe.to(device, dtype=dtype)
+# or enable model offload to save vram:
+# pipe.enable_model_cpu_offload()
+prompt = "extremely aesthetic, best quality, newest, general, 1girl, solo, looking at viewer, blush, slight smile, cat ears, long hair, dress, bare shoulders, cherry blossoms, flowers, petals, vegetation, wind,"
+negative_prompt = "very displeasing, worst quality, oldest, monochrome, sketch, loli, child,"
+output = pipe(
+    width=1024,
+    height=1536,
+    prompt=prompt,
+    negative_prompt=negative_prompt,
+    decoder_guidance_scale=1.0,
+    prior_guidance_scale=8.0,
+    prior_num_inference_steps=40,
+    output_type="pil",
+    num_inference_steps=10
+).images[0]
+## do something with the output image
+```
+## Training Status:
+**Alpha0 Release**: This release resets the training and enables Text Encoder training.
+**GPU used for training**: 1x AMD RX 7900 XTX 24GB
+| dataset name | training done | remaining |
+|---|---|---|
+| **newest** | 003 | 228 |
+| **recent** | 003 | 169 |
+| **mid** | 003 | 121 |
+| **early** | 003 | 067 |
+| **oldest** | 003 | 017 |
+| **pixiv** | 003 | 039 |
+| **visual novel cg** | 003 | 025 |
+| **anime wallpaper** | 003 | 010 |
+| **Total** | 32 | 682 |
+**Note**: chunks starts from 0 and there are 8000 images per chunk
+## Dataset:
+**GPU used for captioning**: 1x Intel ARC A770 16GB
+**Model used for captioning**: SmilingWolf/wd-swinv2-tagger-v3
+**Command:**
+```
+python /mnt/DataSSD/AI/Apps/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py --model_dir "/mnt/DataSSD/AI/models/wd14_tagger_model" --repo_id "SmilingWolf/wd-swinv2-tagger-v3" --recursive --remove_underscore --use_rating_tags --character_tags_first --character_tag_expand --append_tags --onnx --caption_separator ", " --general_threshold 0.35 --character_threshold 0.50 --batch_size 4 --caption_extension ".txt" ./
+```
+| dataset name | total images | total chunk |
+|---|---|---|
+| **newest** | 1.848.331 | 232 |
+| **recent** | 1.380.630 | 173 |
+| **mid** | 993.227 | 125 |
+| **early** | 566.152 | 071 |
+| **oldest** | 160.397 | 021 |
+| **pixiv** | 343.614 | 043 |
+| **visual novel cg** | 231.358 | 029 |
+| **anime wallpaper** | 104.790 | 014 |
+| **Total** | 5.628.499 | 708 |
+**Note**:
+ - Smallest size is 1280x600 | 768.000 pixels
+ - Deduped based on image similarity using czkawka-zli
+## Tags:
+```
+aesthetic tags, quality tags, date tags, custom tags, rating tags, character tags, rest of the tags
+```
+### Date:
+| tag | date |
+|---|---|
+| **newest** | 2022 to 2024 |
+| **recent** | 2019 to 2021 |
+| **mid** | 2015 to 2018 |
+| **early** | 2011 to 2014 |
+| **oldest** | 2005 to 2010 |
+### Aesthetic Tags:
+**Model used**: shadowlilac/aesthetic-shadow-v2
+| score greater than | tag |
+|---|---|
+| **0.90** | extremely aesthetic |
+| **0.80** | very aesthetic |
+| **0.70** | aesthetic |
+| **0.50** | slightly aesthetic |
+| **0.40** | not displeasing |
+| **0.30** | not aesthetic |
+| **0.20** | slightly displeasing |
+| **0.10** | displeasing |
+| **rest of them** | very displeasing |
+### Quality Tags:
+**Model used**: https://huggingface.co/hakurei/waifu-diffusion-v1-4/blob/main/models/aes-B32-v0.pth
+| score greater than | tag |
+|---|---|
+| **0.980** | best quality |
+| **0.900** | high quality |
+| **0.750** | great quality |
+| **0.500** | medium quality |
+| **0.250** | normal quality |
+| **0.125** | bad quality |
+| **0.025** | low quality |
+| **rest of them** | worst quality |
+## Rating Tags
+- general
+- sensitive
+- nsfw
+- explicit nsfw
+## Custom Tags:
+| dataset name | custom tag |
+|---|---|
+| **image boards** | date, |
+| **pixiv** | art by Display_Name, |
+| **visual novel cg** | Full_VN_Name (short_3_letter_name), visual novel cg, |
+| **anime wallpaper** | date, anime wallpaper, |
+## Training Params:
+**Software used**: Kohya SD-Scripts with Stable Cascade branch
+**Base model**: Disty0/sote-diffusion-cascade-alpha0
+### Command:
+```shell
+LD_PRELOAD=/usr/lib/libtcmalloc.so.4 accelerate launch  --mixed_precision fp16 --num_cpu_threads_per_process 1 stable_cascade_train_stage_c.py \
+--mixed_precision fp16 \
+--save_precision fp16 \
+--full_fp16 \
+--sdpa \
+--gradient_checkpointing \
+--train_text_encoder \
+--resolution "1024,1024" \
+--train_batch_size 2 \
+--gradient_accumulation_steps 8 \
+--learning_rate 1e-5 \
+--learning_rate_te1 1e-5 \
+--lr_scheduler constant_with_warmup \
+--lr_warmup_steps 100 \
+--optimizer_type adafactor \
+--optimizer_args "scale_parameter=False" "relative_step=False" "warmup_init=False" \
+--max_grad_norm 0 \
+--token_warmup_min 1 \
+--token_warmup_step 0 \
+--shuffle_caption \
+--caption_separator ", " \
+--caption_dropout_rate 0 \
+--caption_tag_dropout_rate 0 \
+--caption_dropout_every_n_epochs 0 \
+--dataset_repeats 1 \
+--save_state \
+--save_every_n_steps 256 \
+--sample_every_n_steps 64 \
+--max_token_length 225 \
+--max_train_epochs 1 \
+--caption_extension ".txt" \
+--max_data_loader_n_workers 2 \
+--persistent_data_loader_workers \
+--enable_bucket \
+--min_bucket_reso 256 \
+--max_bucket_reso 4096 \
+--bucket_reso_steps 64 \
+--bucket_no_upscale \
+--log_with tensorboard \
+--output_name sotediffusion-wr3_3b \
+--train_data_dir /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0004/0005 \
+--in_json /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0004/0005.json \
+--output_dir /mnt/DataSSD/AI/SoteDiffusion/Wuerstchen3/sotediffusion-wr3_3b-4/0005 \
+--logging_dir /mnt/DataSSD/AI/SoteDiffusion/Wuerstchen3/sotediffusion-wr3_3b-4/0005/logs \
+--resume /mnt/DataSSD/AI/SoteDiffusion/Wuerstchen3/sotediffusion-wr3_3b-4/0004/sotediffusion-wr3_3b-state \
+--stage_c_checkpoint_path /mnt/DataSSD/AI/SoteDiffusion/Wuerstchen3/sotediffusion-wr3_3b-4/0004/sotediffusion-wr3_3b.safetensors \
+--text_model_checkpoint_path /mnt/DataSSD/AI/SoteDiffusion/Wuerstchen3/sotediffusion-wr3_3b-4/0004/sotediffusion-wr3_3b_text_model.safetensors \
+--effnet_checkpoint_path /mnt/DataSSD/AI/models/wuerstchen3/effnet_encoder.safetensors \
+--previewer_checkpoint_path /mnt/DataSSD/AI/models/wuerstchen3/previewer.safetensors \
+--sample_prompts /mnt/DataSSD/AI/SoteDiffusion/Wuerstchen3/config/sotediffusion-prompt.txt
+```
+## Limitations and Bias
+### Bias
+- This model is intended for anime illustrations.
+  Realistic capabilites are not tested at all.
+- Still underbaked.
+### Limitations
+- Can fall back to realistic.
+  Add "realistic" tag to the negatives when this happens.
+- Far shot eyes are can bad.
+- Anatomy and hands can bad.
+## License
+(This part is copied directly from Animagine V3.1 and modified.)
+SoteDiffusion models falls under [Fair AI Public License 1.0-SD](https://freedevproject.org/faipl-1.0-sd/) license, which is compatible with Stable Diffusion models’ license. Key points:
+1. **Modification Sharing:** If you modify SoteDiffusion models, you must share both your changes and the original license.
+2. **Source Code Accessibility:** If your modified version is network-accessible, provide a way (like a download link) for others to get the source code. This applies to derived models too.
+3. **Distribution Terms:** Any distribution must be under this license or another with similar rules.
+4. **Compliance:** Non-compliance must be fixed within 30 days to avoid license termination, emphasizing transparency and adherence to open-source values.
+**Notes**: Anything not covered by Fair AI license is inherited from Stability AI Non-Commercial license which is named as LICENSE_INHERIT. Meaning, still no commercial use of any kind.