SoteDiffusion Wuerstchen3
Collection
Anime Finetune of Würstchen V3
•
8 items
•
Updated
Anime finetune of Stable Cascade.
Currently is in very early state in training.
No commercial use thanks to StabilityAI.
pip install diffusers
import torch
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline
prompt = "newest, 1girl, solo, cat ears, looking at viewer, blush, light smile,"
negative_prompt = "very displeasing, worst quality, monochrome, sketch, fat, child,"
prior = StableCascadePriorPipeline.from_pretrained("Disty0/sote-diffusion-cascade_alpha0", torch_dtype=torch.float16)
decoder = StableCascadeDecoderPipeline.from_pretrained("Disty0/sote-diffusion-cascade-decoder_alpha0", torch_dtype=torch.float16)
prior.enable_model_cpu_offload()
prior_output = prior(
prompt=prompt,
height=1024,
width=1024,
negative_prompt=negative_prompt,
guidance_scale=7.0,
num_images_per_prompt=1,
num_inference_steps=40
)
decoder.enable_model_cpu_offload()
decoder_output = decoder(
image_embeddings=prior_output.image_embeddings,
prompt=prompt,
negative_prompt=negative_prompt,
guidance_scale=1.5
output_type="pil",
num_inference_steps=10
).images[0]
decoder_output.save("cascade.png")
Alpha0 Release: This release resets the training and enables Text Encoder training.
GPU used for training: 1x AMD RX 7900 XTX 24GB
dataset name | training done | remaining |
---|---|---|
newest | 000 | 230 |
recent | 000 | 206 |
mid | 000 | 201 |
early | 000 | 055 |
oldest | 000 | 016 |
pixiv | 000 | 074 |
visual novel cg | 000 | 070 |
anime wallpaper | 000 | 013 |
Total | 8 | 865 |
Note: chunks starts from 0 and there are 8000 images per chunk
GPU used for captioning: 1x Intel ARC A770 16GB
Model used for captioning: SmilingWolf/wd-swinv2-tagger-v3
Command:
python /mnt/DataSSD/AI/Apps/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py --model_dir "/mnt/DataSSD/AI/models/wd14_tagger_model" --repo_id "SmilingWolf/wd-swinv2-tagger-v3" --recursive --remove_underscore --use_rating_tags --character_tags_first --character_tag_expand --append_tags --onnx --caption_separator ", " --general_threshold 0.35 --character_threshold 0.50 --batch_size 4 --caption_extension ".txt" ./
dataset name | total images | total chunk |
---|---|---|
newest | 1.843.053 | 221 |
recent | 1.652.420 | 207 |
mid | 1.609.608 | 202 |
early | 442.368 | 056 |
oldest | 128.311 | 017 |
pixiv | 594.046 | 075 |
visual novel cg | 560.903 | 071 |
anime wallpaper | 106.882 | 014 |
Total | 6.937.591 | 873 |
Note: Smallest size is 1280x600 | 768.000 pixels
aesthetic tags, quality tags, date tags, custom tags, rating tags, character tags, rest of the tags
tag | date |
---|---|
newest | 2022 to 2024 |
recent | 2019 to 2021 |
mid | 2015 to 2018 |
early | 2011 to 2014 |
oldest | 2005 to 2010 |
Model used: shadowlilac/aesthetic-shadow-v2
score greater than | tag |
---|---|
0.90 | extremely aesthetic |
0.80 | very aesthetic |
0.70 | aesthetic |
0.50 | slightly aesthetic |
0.40 | not displeasing |
0.30 | not aesthetic |
0.20 | slightly displeasing |
0.10 | displeasing |
rest of them | very displeasing |
Model used: https://huggingface.co/hakurei/waifu-diffusion-v1-4/blob/main/models/aes-B32-v0.pth
score greater than | tag |
---|---|
0.980 | best quality |
0.900 | high quality |
0.750 | great quality |
0.500 | medium quality |
0.250 | normal quality |
0.125 | bad quality |
0.025 | low quality |
rest of them | worst quality |
dataset name | custom tag |
---|---|
image boards | date, |
pixiv | art by Display_Name, |
visual novel cg | Full_VN_Name (short_3_letter_name), visual novel cg, |
anime wallpaper | date, anime wallpaper, |
Software used: Kohya SD-Scripts with Stable Cascade branch
Base model: Disty0/sote-diffusion-cascade_pre-alpha0
accelerate launch --mixed_precision fp16 --num_cpu_threads_per_process 1 stable_cascade_train_stage_c.py \
--mixed_precision fp16 \
--save_precision fp16 \
--full_fp16 \
--sdpa \
--gradient_checkpointing \
--train_text_encoder \
--resolution "1024,1024" \
--train_batch_size 2 \
--adaptive_loss_weight \
--learning_rate 4e-6 \
--lr_scheduler constant_with_warmup \
--lr_warmup_steps 100 \
--optimizer_type adafactor \
--optimizer_args "scale_parameter=False" "relative_step=False" "warmup_init=False" \
--max_grad_norm 0 \
--token_warmup_min 1 \
--token_warmup_step 0 \
--shuffle_caption \
--caption_dropout_rate 0 \
--caption_tag_dropout_rate 0 \
--caption_dropout_every_n_epochs 0 \
--dataset_repeats 1 \
--save_state \
--save_every_n_steps 2048 \
--sample_every_n_steps 512 \
--max_token_length 225 \
--max_train_epochs 1 \
--caption_extension ".txt" \
--max_data_loader_n_workers 2 \
--persistent_data_loader_workers \
--enable_bucket \
--min_bucket_reso 256 \
--max_bucket_reso 4096 \
--bucket_reso_steps 64 \
--bucket_no_upscale \
--log_with tensorboard \
--output_name sotediffusion-sc_3b \
--train_data_dir /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0000 \
--in_json /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0000.json \
--output_dir /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-0 \
--logging_dir /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-0/logs \
--resume /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-step00020480-state \
--stage_c_checkpoint_path /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-step00020480.safetensors \
--text_model_checkpoint_path /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-step00020480_text_model.safetensors \
--effnet_checkpoint_path /mnt/DataSSD/AI/models/sd-cascade/effnet_encoder.safetensors \
--previewer_checkpoint_path /mnt/DataSSD/AI/models/sd-cascade/previewer.safetensors \
--sample_prompts /mnt/DataSSD/AI/SoteDiffusion/StableCascade/config/sotediffusion-prompt.txt