SoteDiffusion Wuerstchen3
Collection
Anime Finetune of Würstchen V3
•
12 items
•
Updated
Anime finetune of Stable Cascade.
Currently is in very early state in training.
No commercial use thanks to StabilityAI.
pip install diffusers
import torch
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline
prompt = "(extremely aesthetic, best quality, newest), 1girl, solo, cat ears, looking at viewer, blush, light smile, upper body,"
negative_prompt = "very displeasing, worst quality, monochrome, sketch, blurry, fat, child,"
prior = StableCascadePriorPipeline.from_pretrained("Disty0/sote-diffusion-cascade_pre-alpha0", torch_dtype=torch.float16)
decoder = StableCascadeDecoderPipeline.from_pretrained("Disty0/sote-diffusion-cascade-decoder_pre-alpha0", torch_dtype=torch.float16)
prior.enable_model_cpu_offload()
prior_output = prior(
prompt=prompt,
height=1024,
width=1024,
negative_prompt=negative_prompt,
guidance_scale=6.0,
num_images_per_prompt=1,
num_inference_steps=40
)
decoder.enable_model_cpu_offload()
decoder_output = decoder(
image_embeddings=prior_output.image_embeddings,
prompt=prompt,
negative_prompt=negative_prompt,
guidance_scale=2.0,
output_type="pil",
num_inference_steps=10
).images[0]
decoder_output.save("cascade.png")
GPU used for training: 1x AMD RX 7900 XTX 24GB
dataset name | training done | remaining |
---|---|---|
newest | 002 | 218 |
late | 002 | 204 |
mid | 002 | 199 |
early | 002 | 053 |
oldest | 002 | 014 |
pixiv | 002 | 072 |
visual novel cg | 002 | 068 |
anime wallpaper | 002 | 011 |
Total | 24 | 839 |
Note: chunks starts from 0 and there are 8000 images per chunk
GPU used for captioning: 1x Intel ARC A770 16GB
Model used for captioning: SmilingWolf/wd-v1-4-convnextv2-tagger-v2
dataset name | total images | total chunk |
---|---|---|
newest | 1.766.335 | 221 |
late | 1.652.420 | 207 |
mid | 1.609.608 | 202 |
early | 442.368 | 056 |
oldest | 128.311 | 017 |
pixiv | 594.046 | 075 |
visual novel cg | 560.903 | 071 |
anime wallpaper | 106.882 | 014 |
Total | 6.860.873 | 863 |
Note: Smallest size is 1280x600 | 768.000 pixels
aesthetic tags, quality tags, date tags, custom tags, rest of the tags
tag | date |
---|---|
newest | 2022 to 2024 |
late | 2019 to 2021 |
mid | 2015 to 2018 |
early | 2011 to 2014 |
oldest | 2005 to 2010 |
Model used: shadowlilac/aesthetic-shadow
score greater than | tag |
---|---|
0.980 | extremely aesthetic |
0.900 | very aesthetic |
0.750 | aesthetic |
0.500 | slightly aesthetic |
0.350 | not displeasing |
0.250 | not aesthetic |
0.125 | slightly displeasing |
0.025 | displeasing |
rest of them | very displeasing |
Model used: https://huggingface.co/hakurei/waifu-diffusion-v1-4/blob/main/models/aes-B32-v0.pth
score greater than | tag |
---|---|
0.980 | best quality |
0.900 | high quality |
0.750 | great quality |
0.500 | medium quality |
0.250 | normal quality |
0.125 | bad quality |
0.025 | low quality |
rest of them | worst quality |
dataset name | custom tag |
---|---|
image boards | date, |
pixiv | art by Display_Name, |
visual novel cg | Full_VN_Name (short_3_letter_name), visual novel cg, |
anime wallpaper | date, anime wallpaper, |
Software used: Kohya SD-Scripts with Stable Cascade branch
Base model: KBlueLeaf/Stable-Cascade-FP16-fixed
accelerate launch --mixed_precision fp16 --num_cpu_threads_per_process 1 stable_cascade_train_stage_c.py \
--mixed_precision fp16 \
--save_precision fp16 \
--full_fp16 \
--sdpa \
--gradient_checkpointing \
--resolution "1024,1024" \
--train_batch_size 2 \
--gradient_accumulation_steps 32 \
--adaptive_loss_weight \
--learning_rate 4e-6 \
--lr_scheduler constant_with_warmup \
--lr_warmup_steps 100 \
--optimizer_type adafactor \
--optimizer_args "scale_parameter=False" "relative_step=False" "warmup_init=False" \
--max_grad_norm 0 \
--token_warmup_min 1 \
--token_warmup_step 0 \
--shuffle_caption \
--caption_dropout_rate 0 \
--caption_tag_dropout_rate 0 \
--caption_dropout_every_n_epochs 0 \
--dataset_repeats 1 \
--save_state \
--save_every_n_steps 128 \
--sample_every_n_steps 32 \
--max_token_length 225 \
--max_train_epochs 1 \
--caption_extension ".txt" \
--max_data_loader_n_workers 2 \
--persistent_data_loader_workers \
--enable_bucket \
--min_bucket_reso 256 \
--max_bucket_reso 4096 \
--bucket_reso_steps 64 \
--bucket_no_upscale \
--log_with tensorboard \
--output_name sotediffusion-sc_3b \
--train_data_dir /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0002 \
--in_json /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0002.json \
--output_dir /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-2 \
--logging_dir /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-2/logs \
--resume /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-1/sotediffusion-sc_3b-1-state \
--stage_c_checkpoint_path /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-1/sotediffusion-sc_3b-1.safetensors \
--effnet_checkpoint_path /mnt/DataSSD/AI/models/sd-cascade/effnet_encoder.safetensors \
--previewer_checkpoint_path /mnt/DataSSD/AI/models/sd-cascade/previewer.safetensors \
--sample_prompts /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-prompt.txt