patrickvonplaten commited on
Commit
1fa262a
1 Parent(s): 7b376c4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -19
README.md CHANGED
@@ -7,8 +7,18 @@ inference: false
7
  ---
8
 
9
  # Stable Diffusion x2 latent upscaler model card
10
- This model card focuses on the latent diffusion-based upscaler developed by [Katherine Crowson](https://github.com/crowsonkb/k-diffusion) in collaboration with [Stability AI](https://stability.ai/). A notebook that demonstrates the original implementation can be found [here](https://colab.research.google.com/drive/1o1qYJcFeywzCIdkfKJy7cTpgZTCM2EI4).
11
- This model was trained on a high-resolution subset of the LAION-2B dataset. It is a diffusion model that operates in the same latent space as the Stable Diffusion model, which is decoded into a full-resolution image. To use it with Stable Diffusion, You can take the generated latent from Stable Diffusion and pass it into the upscaler before decoding with your standard VAE. Or you can take any image, encode it into the latent space, use the upscaler, and decode it.
 
 
 
 
 
 
 
 
 
 
12
 
13
  | ![upscaler.jpg](https://pbs.twimg.com/media/FhK0YjAVUAUtBbx?format=jpg&name=4096x4096) |
14
  |:--:|
@@ -30,36 +40,31 @@ Original output image | 2x upscaled output image
30
 
31
  ## Examples
32
 
33
- Using the [🤗's Diffusers library](https://github.com/huggingface/diffusers) to run latent upscaler on top of any `StableDiffusionUpscalePipeline` checkpoint to enhance its output image resolution by a factor of 2.
 
34
 
35
  ```bash
36
- pip install diffusers transformers accelerate scipy safetensors
 
37
  ```
38
 
39
  ```python
40
  from diffusers import StableDiffusionLatentUpscalePipeline, StableDiffusionPipeline
41
  import torch
42
 
43
- device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
44
-
45
  pipeline = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
46
  pipeline.to("cuda")
47
 
48
- model_id = "stabilityai/sd-x2-latent-upscaler"
49
- upscaler = StableDiffusionLatentUpscalePipeline.from_pretrained(model_id, torch_dtype=torch.float16)
50
  upscaler.to("cuda")
51
 
52
  prompt = "a photo of an astronaut high resolution, unreal engine, ultra realistic"
53
  generator = torch.manual_seed(33)
54
 
 
 
55
  low_res_latents = pipeline(prompt, generator=generator, output_type="latent").images
56
 
57
- with torch.no_grad():
58
- image = pipeline.decode_latents(low_res_latents)
59
- image = pipeline.numpy_to_pil(image)[0]
60
-
61
- image.save("../images/a1.png")
62
-
63
  upscaled_image = upscaler(
64
  prompt=prompt,
65
  image=low_res_latents,
@@ -68,14 +73,27 @@ upscaled_image = upscaler(
68
  generator=generator,
69
  ).images[0]
70
 
71
- upscaled_image.save("../images/a2.png")
 
 
 
 
 
 
 
 
72
  ```
73
 
 
 
 
 
 
 
74
  **Notes**:
75
  - Despite not being a dependency, we highly recommend you to install [xformers](https://github.com/facebookresearch/xformers) for memory efficient attention (better performance)
76
  - If you have low GPU RAM available, make sure to add a `pipe.enable_attention_slicing()` after sending it to `cuda` for less VRAM usage (to the cost of speed)
77
 
78
-
79
  # Uses
80
 
81
  ## Direct Use
@@ -129,6 +147,4 @@ which consists of images that are limited to English descriptions.
129
  Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for.
130
  This affects the overall output of the model, as white and western cultures are often set as the default. Further, the
131
  ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts.
132
- Stable Diffusion v2 mirrors and exacerbates biases to such a degree that viewer discretion must be advised irrespective of the input or its intent.
133
-
134
-
 
7
  ---
8
 
9
  # Stable Diffusion x2 latent upscaler model card
10
+
11
+ This model card focuses on the latent diffusion-based upscaler developed by [Katherine Crowson](https://github.com/crowsonkb/k-diffusion)
12
+ in collaboration with [Stability AI](https://stability.ai/).
13
+ This model was trained on a high-resolution subset of the LAION-2B dataset.
14
+ It is a diffusion model that operates in the same latent space as the Stable Diffusion model, which is decoded into a full-resolution image.
15
+ To use it with Stable Diffusion, You can take the generated latent from Stable Diffusion and pass it into the upscaler before decoding with your standard VAE.
16
+ Or you can take any image, encode it into the latent space, use the upscaler, and decode it.
17
+
18
+ **Note**:
19
+ This upscaling model is designed explicitely for **Stable Diffusion** as it can upscale Stable Diffusion's latent denoised image embeddings.
20
+ This allows for very fast text-to-image + upscaling pipelines as all intermeditate states can be kept on GPU. More for information, see example below.
21
+ This model works on all [Stable Diffusion checkpoints](https://huggingface.co/models?other=stable-diffusion)
22
 
23
  | ![upscaler.jpg](https://pbs.twimg.com/media/FhK0YjAVUAUtBbx?format=jpg&name=4096x4096) |
24
  |:--:|
 
40
 
41
  ## Examples
42
 
43
+ Using the [🤗's Diffusers library](https://github.com/huggingface/diffusers) to run latent upscaler on top of any `StableDiffusionUpscalePipeline` checkpoint
44
+ to enhance its output image resolution by a factor of 2.
45
 
46
  ```bash
47
+ pip install git+https://github.com/huggingface/diffusers.git
48
+ pip install transformers accelerate scipy safetensors
49
  ```
50
 
51
  ```python
52
  from diffusers import StableDiffusionLatentUpscalePipeline, StableDiffusionPipeline
53
  import torch
54
 
 
 
55
  pipeline = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
56
  pipeline.to("cuda")
57
 
58
+ upscaler = StableDiffusionLatentUpscalePipeline.from_pretrained("stabilityai/sd-x2-latent-upscaler", torch_dtype=torch.float16)
 
59
  upscaler.to("cuda")
60
 
61
  prompt = "a photo of an astronaut high resolution, unreal engine, ultra realistic"
62
  generator = torch.manual_seed(33)
63
 
64
+ # we stay in latent space! Let's make sure that Stable Diffusion returns the image
65
+ # in latent space
66
  low_res_latents = pipeline(prompt, generator=generator, output_type="latent").images
67
 
 
 
 
 
 
 
68
  upscaled_image = upscaler(
69
  prompt=prompt,
70
  image=low_res_latents,
 
73
  generator=generator,
74
  ).images[0]
75
 
76
+ # Let's save the upscaled image under "upscaled_astronaut.png"
77
+ upscaled_image.save("astronaut_1024.png")
78
+
79
+ # as a comparison: Let's also save the low-res image
80
+ with torch.no_grad():
81
+ image = pipeline.decode_latents(low_res_latents)
82
+ image = pipeline.numpy_to_pil(image)[0]
83
+
84
+ image.save("astronaut_512.png")
85
  ```
86
 
87
+ **1024-res Astronaut
88
+ ![upscaled](./astronaut_1024.png)
89
+
90
+ **512-res Astronaut
91
+ ![ow_res](./astronaut_512.png)
92
+
93
  **Notes**:
94
  - Despite not being a dependency, we highly recommend you to install [xformers](https://github.com/facebookresearch/xformers) for memory efficient attention (better performance)
95
  - If you have low GPU RAM available, make sure to add a `pipe.enable_attention_slicing()` after sending it to `cuda` for less VRAM usage (to the cost of speed)
96
 
 
97
  # Uses
98
 
99
  ## Direct Use
 
147
  Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for.
148
  This affects the overall output of the model, as white and western cultures are often set as the default. Further, the
149
  ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts.
150
+ Stable Diffusion v2 mirrors and exacerbates biases to such a degree that viewer discretion must be advised irrespective of the input or its intent.