File size: 3,588 Bytes
499a66e 7a6d5a0 499a66e 7a6d5a0 fd6b1e9 7a6d5a0 81a553d 7a6d5a0 81a553d cf4db73 7a6d5a0 50ebf03 7a6d5a0 81a553d cf4db73 81a553d 7a6d5a0 81a553d cf4db73 81a553d cf4db73 81a553d cf4db73 81a553d 8185317 81a553d cf4db73 81a553d cf4db73 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
---
license: mit
library_name: diffusers
---
# Stage-A-ft-HQ
`stage-a-ft-hq` is a version of [Würstchen](https://huggingface.co/warp-ai/wuerstchen)'s **Stage A** that was finetuned to have slightly-nicer-looking textures.
`stage-a-ft-hq` works with any Würstchen-derived model (including [Stable Cascade](https://huggingface.co/stabilityai/stable-cascade)).
## Example comparison
| Stable Cascade | Stable Cascade + `stage-a-ft-hq` |
| --------------------------------- | ---------------------------------- |
| ![](example_baseline.png) | ![](example_finetuned.png) |
| ![](example_baseline_closeup.png) | ![](example_finetuned_closeup.png) |
## Explanation
Image generators like Würstchen and Stable Cascade create images via a multi-stage process.
Stage A is the ultimate stage, responsible for rendering out full-resolution, human-interpretable images (based on the output from prior stages).
The original Stage A tends to render slightly-smoothed-out images with a distinctive noise pattern on top.
`stage-a-ft-hq` was finetuned briefly on a high-quality dataset in order to reduce these artifacts.
## Suggested Settings
To generate highly detailed images, you probably want to use `stage-a-ft-hq` (which improves very fine detail) in combination with a large Stage B step count (which [improves mid-level detail](https://old.reddit.com/r/StableDiffusion/comments/1ar359h/cascade_can_generate_directly_at_1536x1536_and/kqhjtk5/)).
## ComfyUI Usage
Download the file [`stage_a_ft_hq.safetensors`](https://huggingface.co/madebyollin/stage-a-ft-hq/resolve/main/stage_a_ft_hq.safetensors?download=true), put it in `ComfyUI/models/vae`, and make sure your VAE Loader node is loading this file.
(`stage_a_ft_hq.safetensors` includes the [special key](https://github.com/comfyanonymous/ComfyUI/blob/d91f45ef280a5acbdc22f3cc757f8fdbb254261b/comfy/sd.py#L181) that ComfyUI uses to auto-identify Stage A model files)
## 🧨 Diffusers Usage
⚠️ As of 2024-02-17, Stable Cascade's [PR](https://github.com/huggingface/diffusers/pull/6487) is still under review.
I've only tested Stable Cascade with this particular version of the PR:
```bash
pip install --upgrade --force-reinstall https://github.com/kashif/diffusers/archive/a3dc21385b7386beb3dab3a9845962ede6765887.zip
```
```py
import torch
device = "cuda"
# Load the Stage-A-ft-HQ model
from diffusers.pipelines.wuerstchen import PaellaVQModel
stage_a_ft_hq = PaellaVQModel.from_pretrained("madebyollin/stage-a-ft-hq", torch_dtype=torch.float16).to(device)
# Load the normal Stable Cascade pipeline
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline
num_images_per_prompt = 1
prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", torch_dtype=torch.bfloat16).to(device)
decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", torch_dtype=torch.float16).to(device)
# Swap in the Stage-A-ft-HQ model
decoder.vqgan = stage_a_ft_hq
prompt = "Photograph of Seattle streets on a snowy winter morning"
negative_prompt = ""
prior_output = prior(
prompt=prompt,
height=1024,
width=1024,
negative_prompt=negative_prompt,
guidance_scale=4.0,
num_images_per_prompt=num_images_per_prompt,
num_inference_steps=20
)
decoder_output = decoder(
image_embeddings=prior_output.image_embeddings.half(),
prompt=prompt,
negative_prompt=negative_prompt,
guidance_scale=0.0,
output_type="pil",
num_inference_steps=20
).images
display(decoder_output[0])
``` |