madebyollin commited on
Commit
81a553d
1 Parent(s): 7a79e2c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -8
README.md CHANGED
@@ -5,26 +5,78 @@ library_name: diffusers
5
 
6
  # Stage-A-ft-HQ
7
 
8
- `stage-a-ft-hq` is a version of [Würstchen](https://huggingface.co/warp-ai/wuerstchen)'s **Stage A** that was finetuned to generate sharper details and textures. <br/>
 
9
  `stage-a-ft-hq` works with any Würstchen-derived model (including [Stable Cascade](https://huggingface.co/stabilityai/stable-cascade)).
10
 
11
- > TODO: comparison goes here
 
 
 
 
 
 
12
 
13
  ## 🧨 Diffusers Usage
14
 
 
 
 
 
 
 
15
  ```py
16
- # TODO: code goes here
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  ```
18
 
19
  ## Explanation
20
 
21
- Image generators like Würstchen and Stable Cascade create images via a multi-stage process. <br/>
22
  Stage A is the ultimate stage, responsible for rendering out full-resolution, human-interpretable images (based on the output from prior stages).
23
 
24
- The original Stage A tends to render slightly-smoothed-out images with a distinctive grain pattern on top.
25
 
26
- `stage-a-ft-hq` was finetuned on a high-quality dataset in order to generate cleaner, sharper, more realistic textures with fewer distinctive artifacts.
27
 
28
- ## Recommended Settings
29
 
30
- To generate highly detailed images, you probably want to use `stage-a-ft-hq` (which improves very fine detail) in combination with a large Stage B step count (which improves mid-level detail).
 
5
 
6
  # Stage-A-ft-HQ
7
 
8
+ `stage-a-ft-hq` is a version of [Würstchen](https://huggingface.co/warp-ai/wuerstchen)'s **Stage A** that was finetuned to have slightly-nicer-looking textures.
9
+
10
  `stage-a-ft-hq` works with any Würstchen-derived model (including [Stable Cascade](https://huggingface.co/stabilityai/stable-cascade)).
11
 
12
+ ## Example comparison
13
+
14
+ | Stable Cascade | Stable Cascade + `stage-a-ft-hq` |
15
+ | --------------------------------- | ---------------------------------- |
16
+ | ![](example_baseline.png) | ![](example_finetuned.png) |
17
+ | ![](example_baseline_closeup.png) | ![](example_finetuned_closeup.png) |
18
+
19
 
20
  ## 🧨 Diffusers Usage
21
 
22
+ ⚠️ As of 2024-02-17, Stable Cascade's [PR](https://github.com/huggingface/diffusers/pull/6487) is still under review.
23
+ I've only confirmed Stable Cascade working with this particular version of the PR:
24
+ ```bash
25
+ pip install --upgrade --force-reinstall https://github.com/kashif/diffusers/archive/a3dc21385b7386beb3dab3a9845962ede6765887.zip
26
+ ```
27
+
28
  ```py
29
+ import torch
30
+
31
+ # Load the Stage-A-ft-HQ model
32
+ from diffusers.pipelines.wuerstchen import PaellaVQModel
33
+ stage_a_ft_hq = PaellaVQModel.from_pretrained("madebyollin/stage_a_ft_hq", torch_dtype=torch.float16)
34
+
35
+ # Load the normal Stable Cascade pipeline
36
+ from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline
37
+
38
+ device = "cuda"
39
+ num_images_per_prompt = 2
40
+
41
+ prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", torch_dtype=torch.bfloat16).to(device)
42
+ decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", torch_dtype=torch.float16).to(device)
43
+
44
+ # Swap in the Stage-A-ft-HQ model
45
+ decoder.vqgan = stage_a_ft_hq
46
+
47
+ prompt = "Anthropomorphic cat dressed as a pilot"
48
+ negative_prompt = ""
49
+
50
+ prior_output = prior(
51
+ prompt=prompt,
52
+ height=1024,
53
+ width=1024,
54
+ negative_prompt=negative_prompt,
55
+ guidance_scale=4.0,
56
+ num_images_per_prompt=num_images_per_prompt,
57
+ num_inference_steps=20
58
+ )
59
+ decoder_output = decoder(
60
+ image_embeddings=prior_output.image_embeddings.half(),
61
+ prompt=prompt,
62
+ negative_prompt=negative_prompt,
63
+ guidance_scale=0.0,
64
+ output_type="pil",
65
+ num_inference_steps=10
66
+ ).images
67
+
68
+ display(decoder_output[0])
69
  ```
70
 
71
  ## Explanation
72
 
73
+ Image generators like Würstchen and Stable Cascade create images via a multi-stage process.
74
  Stage A is the ultimate stage, responsible for rendering out full-resolution, human-interpretable images (based on the output from prior stages).
75
 
76
+ The original Stage A tends to render slightly-smoothed-out images with a distinctive noise pattern on top.
77
 
78
+ `stage-a-ft-hq` was finetuned briefly on a high-quality dataset in order to reduce these artifacts.
79
 
80
+ ## Suggested Settings
81
 
82
+ To generate highly detailed images, you probably want to use `stage-a-ft-hq` (which improves very fine detail) in combination with a large Stage B step count (which [improves mid-level detail](https://old.reddit.com/r/StableDiffusion/comments/1ar359h/cascade_can_generate_directly_at_1536x1536_and/kqhjtk5/)).