--- license: mit library_name: diffusers --- # Stage-A-ft-HQ `stage-a-ft-hq` is a version of [Würstchen](https://huggingface.co/warp-ai/wuerstchen)'s **Stage A** that was finetuned to have slightly-nicer-looking textures. `stage-a-ft-hq` works with any Würstchen-derived model (including [Stable Cascade](https://huggingface.co/stabilityai/stable-cascade)). ## Example comparison | Stable Cascade | Stable Cascade + `stage-a-ft-hq` | | --------------------------------- | ---------------------------------- | | ![](example_baseline.png) | ![](example_finetuned.png) | | ![](example_baseline_closeup.png) | ![](example_finetuned_closeup.png) | ## Explanation Image generators like Würstchen and Stable Cascade create images via a multi-stage process. Stage A is the ultimate stage, responsible for rendering out full-resolution, human-interpretable images (based on the output from prior stages). The original Stage A tends to render slightly-smoothed-out images with a distinctive noise pattern on top. `stage-a-ft-hq` was finetuned briefly on a high-quality dataset in order to reduce these artifacts. ## Suggested Settings To generate highly detailed images, you probably want to use `stage-a-ft-hq` (which improves very fine detail) in combination with a large Stage B step count (which [improves mid-level detail](https://old.reddit.com/r/StableDiffusion/comments/1ar359h/cascade_can_generate_directly_at_1536x1536_and/kqhjtk5/)). ## ComfyUI Usage Download the file [`stage_a_ft_hq.safetensors`](https://huggingface.co/madebyollin/stage-a-ft-hq/resolve/main/stage_a_ft_hq.safetensors?download=true), put it in `ComfyUI/models/vae`, and make sure your VAE Loader node is loading this file. (`stage_a_ft_hq.safetensors` includes the [special key](https://github.com/comfyanonymous/ComfyUI/blob/d91f45ef280a5acbdc22f3cc757f8fdbb254261b/comfy/sd.py#L181) that ComfyUI uses to auto-identify Stage A model files) ## 🧨 Diffusers Usage ⚠️ As of 2024-02-17, Stable Cascade's [PR](https://github.com/huggingface/diffusers/pull/6487) is still under review. I've only tested Stable Cascade with this particular version of the PR: ```bash pip install --upgrade --force-reinstall https://github.com/kashif/diffusers/archive/a3dc21385b7386beb3dab3a9845962ede6765887.zip ``` ```py import torch device = "cuda" # Load the Stage-A-ft-HQ model from diffusers.pipelines.wuerstchen import PaellaVQModel stage_a_ft_hq = PaellaVQModel.from_pretrained("madebyollin/stage-a-ft-hq", torch_dtype=torch.float16).to(device) # Load the normal Stable Cascade pipeline from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline num_images_per_prompt = 1 prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", torch_dtype=torch.bfloat16).to(device) decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", torch_dtype=torch.float16).to(device) # Swap in the Stage-A-ft-HQ model decoder.vqgan = stage_a_ft_hq prompt = "Photograph of Seattle streets on a snowy winter morning" negative_prompt = "" prior_output = prior( prompt=prompt, height=1024, width=1024, negative_prompt=negative_prompt, guidance_scale=4.0, num_images_per_prompt=num_images_per_prompt, num_inference_steps=20 ) decoder_output = decoder( image_embeddings=prior_output.image_embeddings.half(), prompt=prompt, negative_prompt=negative_prompt, guidance_scale=0.0, output_type="pil", num_inference_steps=20 ).images display(decoder_output[0]) ```