--- pipeline_tag: text-to-image license: other license_name: stable-cascade-nc-community license_link: LICENSE --- # SoteDiffusion Cascade Anime finetune of Stable Cascade. Currently is in very early state in training. No commercial use thanks to StabilityAI. ## Code Example ```shell pip install diffusers ``` ```python import torch from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline prompt = "(extremely aesthetic, best quality, newest), 1girl, solo, cat ears, looking at viewer, blush, light smile, upper body," negative_prompt = "very displeasing, worst quality, monochrome, sketch, blurry, fat, child," prior = StableCascadePriorPipeline.from_pretrained("Disty0/SoteDiffusion-Cascade_pre-alpha0", torch_dtype=torch.float16) decoder = StableCascadeDecoderPipeline.from_pretrained("Disty0/SoteDiffusion-Cascade_Decoder", torch_dtype=torch.float16) prior.enable_model_cpu_offload() prior_output = prior( prompt=prompt, height=1024, width=1024, negative_prompt=negative_prompt, guidance_scale=6.0, num_images_per_prompt=1, num_inference_steps=30 ) decoder.enable_model_cpu_offload() decoder_output = decoder( image_embeddings=prior_output.image_embeddings.to(torch.float16), prompt=prompt, negative_prompt=negative_prompt, guidance_scale=1.0, output_type="pil", num_inference_steps=10 ).images[0] decoder_output.save("cascade.png") ``` ## Training Status: **GPU used for training**: 1x AMD RX 7900 XTX 24GB | dataset name | training done | remaining | |---|---|---| | **newest** | 002 | 218 | | **late** | 002 | 204 | | **mid** | 002 | 199 | | **early** | 002 | 053 | | **oldest** | 002 | 014 | | **pixiv** | 002 | 072 | | **visual novel cg** | 002 | 068 | | **anime wallpaper** | 002 | 011 | | **Total** | 24 | 839 | **Note**: chunks starts from 0 and there are 8000 images per chunk ## Dataset: **GPU used for captioning**: 1x Intel ARC A770 16GB **Model used for captioning**: SmilingWolf|wd-v1-4-convnextv2-tagger-v2 | dataset name | total images | total chunk | |---|---|---| | **newest** | 1.75M | 221 | | **late** | 1.65M | 207 | | **mid** | 1.60M | 202 | | **early** | 442K | 056 | | **oldest** | 128K | 017 | | **pixiv** | 594K | 075 | | **visual novel cg** | 560K | 071 | | **anime wallpaper** | 106K | 014 | | **Total** | 6.860.873 | 863 | **Note**: Smallest size is 1280x600 | 768.000 pixels ## Tags: ``` aesthetic tags, quality tags, custom tags, date tags, rest of the tags ``` ### Date: | tag | date | |---|---| | **newest** | 2022 to 2024 | | **late** | 2019 to 2021 | | **mid** | 2015 to 2018 | | **early** | 2011 to 2014 | | **oldest** | 2005 to 2010 | ### Aesthetic Tags: **Model used**: shadowlilac/aesthetic-shadow | score greater than | tag | |---|---| | **0.980** | extremely aesthetic | | **0.900** | very aesthetic | | **0.750** | aesthetic | | **0.500** | slightly aesthetic | | **0.350** | not displeasing | | **0.250** | not aesthetic | | **0.125** | slightly displeasing | | **0.025** | displeasing | | **rest of them** | very displeasing | ### Quality Tags: **Model used**: https://huggingface.co/hakurei/waifu-diffusion-v1-4/blob/main/models/aes-B32-v0.pth | score greater than | tag | |---|---| | **0.980** | best quality | | **0.900** | high quality | | **0.750** | great quality | | **0.500** | medium quality | | **0.250** | normal quality | | **0.125** | bad quality | | **0.025** | low quality | | **rest of them** | worst quality | ## Custom Tags: | dataset name | custom tag | |---|---| | **booru**: date, | | **pixiv**: art by Display_Name, | | **visual novel cg**: Full_VN_Name (short_3_letter_name), visual novel cg, | | **anime wallpaper**: anime wallpaper, | ## Training Params: **Software used**: Kohya SD-Scripts with Stable Cascade branch **Base model**: KBlueLeaf/Stable-Cascade-FP16-fixed ### Command: ``` accelerate launch --mixed_precision fp16 --num_cpu_threads_per_process 1 stable_cascade_train_stage_c.py \ --mixed_precision fp16 \ --save_precision fp16 \ --full_fp16 \ --sdpa \ --gradient_checkpointing \ --resolution "1024,1024" \ --train_batch_size 2 \ --gradient_accumulation_steps 32 \ --adaptive_loss_weight \ --learning_rate 4e-6 \ --lr_scheduler constant_with_warmup \ --lr_warmup_steps 100 \ --optimizer_type adafactor \ --optimizer_args "scale_parameter=False" "relative_step=False" "warmup_init=False" \ --max_grad_norm 0 \ --token_warmup_min 1 \ --token_warmup_step 0 \ --shuffle_caption \ --caption_dropout_rate 0 \ --caption_tag_dropout_rate 0 \ --caption_dropout_every_n_epochs 0 \ --dataset_repeats 1 \ --save_state \ --save_every_n_steps 128 \ --sample_every_n_steps 32 \ --max_token_length 225 \ --max_train_epochs 1 \ --caption_extension ".txt" \ --max_data_loader_n_workers 2 \ --persistent_data_loader_workers \ --enable_bucket \ --min_bucket_reso 256 \ --max_bucket_reso 4096 \ --bucket_reso_steps 64 \ --bucket_no_upscale \ --log_with tensorboard \ --output_name sotediffusion-sc_3b \ --train_data_dir /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0002 \ --in_json /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0002.json \ --output_dir /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-2 \ --logging_dir /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-2/logs \ --resume /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-1/sotediffusion-sc_3b-1-state \ --stage_c_checkpoint_path /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-1/sotediffusion-sc_3b-1.safetensors \ --effnet_checkpoint_path /mnt/DataSSD/AI/models/sd-cascade/effnet_encoder.safetensors \ --previewer_checkpoint_path /mnt/DataSSD/AI/models/sd-cascade/previewer.safetensors \ --sample_prompts /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-prompt.txt ``` ## Limitations and Bias ### Bias - This model is intended for anime illustrations. Realistic capabilites are not tested at all. - Current version has bias to older anime styles. ### Limitations - Can fall back to realistic. Use "anime illustration" tag to point it into the right direction. - Far shot eyes are bad thanks to the heavy latent compression.