stabilityai
/

sd-x2-latent-upscaler

Diffusers

Safetensors

StableDiffusionLatentUpscalePipeline

stable-diffusion

Model card Files Files and versions Community

patrickvonplaten commited on Feb 8, 2023

Commit

1fa262a

•

1 Parent(s): 7b376c4

Update README.md

Browse files

Files changed (1) hide show

README.md +35 -19

README.md CHANGED Viewed

@@ -7,8 +7,18 @@ inference: false
 ---
 # Stable Diffusion x2 latent upscaler model card
-This model card focuses on the latent diffusion-based upscaler developed by [Katherine Crowson](https://github.com/crowsonkb/k-diffusion) in collaboration with [Stability AI](https://stability.ai/). A notebook that demonstrates the original implementation can be found [here](https://colab.research.google.com/drive/1o1qYJcFeywzCIdkfKJy7cTpgZTCM2EI4).
-This model was trained on a high-resolution subset of the LAION-2B dataset. It is a diffusion model that operates in the same latent space as the Stable Diffusion model, which is decoded into a full-resolution image.  To use it with Stable Diffusion, You can take the generated latent from Stable Diffusion and pass it into the upscaler before decoding with your standard VAE. Or you can take any image, encode it into the latent space, use the upscaler, and decode it.
 | ![upscaler.jpg](https://pbs.twimg.com/media/FhK0YjAVUAUtBbx?format=jpg&name=4096x4096) |
 |:--:|
@@ -30,36 +40,31 @@ Original output image             |  2x upscaled output image
 ## Examples
-Using the [🤗's Diffusers library](https://github.com/huggingface/diffusers) to run latent upscaler on top of any `StableDiffusionUpscalePipeline` checkpoint to enhance its output image resolution by a factor of 2.
 ```bash
-pip install diffusers transformers accelerate scipy safetensors
 ```
 ```python
 from diffusers import StableDiffusionLatentUpscalePipeline, StableDiffusionPipeline
 import torch
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 pipeline = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
 pipeline.to("cuda")
-model_id = "stabilityai/sd-x2-latent-upscaler"
-upscaler = StableDiffusionLatentUpscalePipeline.from_pretrained(model_id, torch_dtype=torch.float16)
 upscaler.to("cuda")
 prompt = "a photo of an astronaut high resolution, unreal engine, ultra realistic"
 generator = torch.manual_seed(33)
 low_res_latents = pipeline(prompt, generator=generator, output_type="latent").images
-with torch.no_grad():
-    image = pipeline.decode_latents(low_res_latents)
-image = pipeline.numpy_to_pil(image)[0]
-image.save("../images/a1.png")
 upscaled_image = upscaler(
     prompt=prompt,
     image=low_res_latents,
@@ -68,14 +73,27 @@ upscaled_image = upscaler(
     generator=generator,
 ).images[0]
-upscaled_image.save("../images/a2.png")
 ```
 **Notes**:
 - Despite not being a dependency, we highly recommend you to install [xformers](https://github.com/facebookresearch/xformers) for memory efficient attention (better performance)
 - If you have low GPU RAM available, make sure to add a `pipe.enable_attention_slicing()` after sending it to `cuda` for less VRAM usage (to the cost of speed)
 # Uses
 ## Direct Use
@@ -129,6 +147,4 @@ which consists of images that are limited to English descriptions.
 Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for.
 This affects the overall output of the model, as white and western cultures are often set as the default. Further, the
 ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts.
-Stable Diffusion v2 mirrors and exacerbates biases to such a degree that viewer discretion must be advised irrespective of the input or its intent.

 ---
 # Stable Diffusion x2 latent upscaler model card
+This model card focuses on the latent diffusion-based upscaler developed by [Katherine Crowson](https://github.com/crowsonkb/k-diffusion)
+in collaboration with [Stability AI](https://stability.ai/).
+This model was trained on a high-resolution subset of the LAION-2B dataset.
+It is a diffusion model that operates in the same latent space as the Stable Diffusion model, which is decoded into a full-resolution image.
+To use it with Stable Diffusion, You can take the generated latent from Stable Diffusion and pass it into the upscaler before decoding with your standard VAE.
+Or you can take any image, encode it into the latent space, use the upscaler, and decode it.
+**Note**:
+This upscaling model is designed explicitely for **Stable Diffusion** as it can upscale Stable Diffusion's latent denoised image embeddings.
+This allows for very fast text-to-image + upscaling pipelines as all intermeditate states can be kept on GPU. More for information, see example below.
+This model works on all [Stable Diffusion checkpoints](https://huggingface.co/models?other=stable-diffusion)
 | ![upscaler.jpg](https://pbs.twimg.com/media/FhK0YjAVUAUtBbx?format=jpg&name=4096x4096) |
 |:--:|
 ## Examples
+Using the [🤗's Diffusers library](https://github.com/huggingface/diffusers) to run latent upscaler on top of any `StableDiffusionUpscalePipeline` checkpoint
+to enhance its output image resolution by a factor of 2.
 ```bash
+pip install git+https://github.com/huggingface/diffusers.git
+pip install transformers accelerate scipy safetensors
 ```
 ```python
 from diffusers import StableDiffusionLatentUpscalePipeline, StableDiffusionPipeline
 import torch
 pipeline = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
 pipeline.to("cuda")
+upscaler = StableDiffusionLatentUpscalePipeline.from_pretrained("stabilityai/sd-x2-latent-upscaler", torch_dtype=torch.float16)
 upscaler.to("cuda")
 prompt = "a photo of an astronaut high resolution, unreal engine, ultra realistic"
 generator = torch.manual_seed(33)
+# we stay in latent space! Let's make sure that Stable Diffusion returns the image
+# in latent space
 low_res_latents = pipeline(prompt, generator=generator, output_type="latent").images
 upscaled_image = upscaler(
     prompt=prompt,
     image=low_res_latents,
     generator=generator,
 ).images[0]
+# Let's save the upscaled image under "upscaled_astronaut.png"
+upscaled_image.save("astronaut_1024.png")
+# as a comparison: Let's also save the low-res image
+with torch.no_grad():
+    image = pipeline.decode_latents(low_res_latents)
+image = pipeline.numpy_to_pil(image)[0]
+image.save("astronaut_512.png")
 ```
+**1024-res Astronaut
+![upscaled](./astronaut_1024.png)
+**512-res Astronaut
+![ow_res](./astronaut_512.png)
 **Notes**:
 - Despite not being a dependency, we highly recommend you to install [xformers](https://github.com/facebookresearch/xformers) for memory efficient attention (better performance)
 - If you have low GPU RAM available, make sure to add a `pipe.enable_attention_slicing()` after sending it to `cuda` for less VRAM usage (to the cost of speed)
 # Uses
 ## Direct Use
 Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for.
 This affects the overall output of the model, as white and western cultures are often set as the default. Further, the
 ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts.
+Stable Diffusion v2 mirrors and exacerbates biases to such a degree that viewer discretion must be advised irrespective of the input or its intent.