SoteDiffusion Cascade

Anime finetune of Stable Cascade Decoder.
No commercial use thanks to StabilityAI.

Code Example

pip install diffusers

import torch
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline

prompt = "newest, 1girl, solo, cat ears, looking at viewer, blush, light smile,"
negative_prompt = "very displeasing, worst quality, monochrome, sketch, fat, child,"

prior = StableCascadePriorPipeline.from_pretrained("Disty0/sote-diffusion-cascade_alpha0", torch_dtype=torch.float16)
decoder = StableCascadeDecoderPipeline.from_pretrained("Disty0/sote-diffusion-cascade-decoder_alpha0", torch_dtype=torch.float16)

prior.enable_model_cpu_offload()
prior_output = prior(
    prompt=prompt,
    height=1024,
    width=1024,
    negative_prompt=negative_prompt,
    guidance_scale=7.0,
    num_images_per_prompt=1,
    num_inference_steps=40
)

decoder.enable_model_cpu_offload()
decoder_output = decoder(
    image_embeddings=prior_output.image_embeddings,
    prompt=prompt,
    negative_prompt=negative_prompt,
    guidance_scale=1.5
    output_type="pil",
    num_inference_steps=10
).images[0]
decoder_output.save("cascade.png")

Dataset

Used the same dataset as Disty0/sote-diffusion-cascade-decoder_pre-alpha0.
Trained with 98K~ images.

Training:

GPU used for training: 1x AMD RX 7900 XTX 24GB

Software used: https://github.com/2kpr/StableCascade

Config:

experiment_id: sotediffusion-sc-b_3b
model_version: 3B
dtype: bfloat16
use_fsdp: False

batch_size: 1
grad_accum_steps: 1
updates: 98000
backup_every: 2048
save_every: 1024
warmup_updates: 100

lr: 4.0e-6
optimizer_type: Adafactor
adaptive_loss_weight: True
stochastic_rounding: True

image_size: 768
multi_aspect_ratio: [1/1, 1/2, 1/3, 2/3, 3/4, 1/5, 2/5, 3/5, 4/5, 1/6, 5/6, 9/16]
shift: 4

checkpoint_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/
output_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/
webdataset_path: file:/mnt/DataSSD/AI/anime_image_dataset/best/newest_best-{0000..0001}.tar

effnet_checkpoint_path: /mnt/DataSSD/AI/models/sd-cascade/effnet_encoder.safetensors
stage_a_checkpoint_path: /mnt/DataSSD/AI/models/sd-cascade/stage_a.safetensors
generator_checkpoint_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-stage_b.safetensors

Limitations and Bias

Bias

This model is intended for anime illustrations.
Realistic capabilites are not tested at all.

Limitations

Far shot eyes are still bad thanks to the heavy latent compression.

Disty0
/

sote-diffusion-cascade-decoder_alpha0