--- pipeline_tag: text-to-image license: other license_name: stable-cascade-nc-community license_link: LICENSE --- # SoteDiffusion Cascade Anime finetune of Stable Cascade Decoder. No commercial use thanks to StabilityAI. ## Code Example ```shell pip install diffusers ``` ```python import torch from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline prompt = "(extremely aesthetic, best quality, newest), 1girl, solo, cat ears, looking at viewer, blush, light smile, upper body," negative_prompt = "very displeasing, worst quality, monochrome, sketch, blurry, fat, child," prior = StableCascadePriorPipeline.from_pretrained("Disty0/sote-diffusion-cascade_pre-alpha0", torch_dtype=torch.float16) decoder = StableCascadeDecoderPipeline.from_pretrained("Disty0/sote-diffusion-cascade-decoder_pre-alpha0", torch_dtype=torch.float16) prior.enable_model_cpu_offload() prior_output = prior( prompt=prompt, height=1024, width=1024, negative_prompt=negative_prompt, guidance_scale=6.0, num_images_per_prompt=1, num_inference_steps=40 ) decoder.enable_model_cpu_offload() decoder_output = decoder( image_embeddings=prior_output.image_embeddings, prompt=prompt, negative_prompt=negative_prompt, guidance_scale=2.0, output_type="pil", num_inference_steps=10 ).images[0] decoder_output.save("cascade.png") ``` ## Dataset Used the same dataset as SoteDiffusion-Cascade_pre-alpha0. Selected images from newest dataset that got more than 0.98 score by both aesthetic and quality taggers. Trained with 98K~ images. ## Training: **GPU used for training**: 1x AMD RX 7900 XTX 24GB **Software used**: https://github.com/2kpr/StableCascade ### Config: ``` experiment_id: sotediffusion-sc-b_3b model_version: 3B dtype: bfloat16 use_fsdp: False batch_size: 64 grad_accum_steps: 64 updates: 3000 backup_every: 128 save_every: 32 warmup_updates: 100 lr: 4.0e-6 optimizer_type: Adafactor adaptive_loss_weight: True stochastic_rounding: True image_size: 768 multi_aspect_ratio: [1/1, 1/2, 1/3, 2/3, 3/4, 1/5, 2/5, 3/5, 4/5, 1/6, 5/6, 9/16] shift: 4 checkpoint_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/ output_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/ webdataset_path: file:/mnt/DataSSD/AI/anime_image_dataset/best/newest_best-{0000..0001}.tar effnet_checkpoint_path: /mnt/DataSSD/AI/models/sd-cascade/effnet_encoder.safetensors stage_a_checkpoint_path: /mnt/DataSSD/AI/models/sd-cascade/stage_a.safetensors generator_checkpoint_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/stage_b-generator-049152.safetensors ``` ## Limitations and Bias ### Bias - This model is intended for anime illustrations. Realistic capabilites are not tested at all. ### Limitations - Far shot eyes are bad thanks to the heavy latent compression.