toto10's picture
Upload folder using huggingface_hub (#1)
34097e9
|
raw
history blame
9.11 kB

StableSR for Stable Diffusion WebUI

Licensed under S-Lab License 1.0

CC BY-NC-SA 4.0

English|中文

  • StableSR is a competitive super-resolution method originally proposed by Jianyi Wang et al.
  • This repository is a migration of the StableSR project to the Automatic1111 WebUI.

Relevant Links

Click to view high-quality official examples!

If you find this project useful, please give me & Jianyi Wang a star! ⭐


Features

  1. High-fidelity detailed image upscaling:
    • Being very detailed while keeping the face identity of your characters.
    • Suitable for most images (Realistic or Anime, Photography or AIGC, SD 1.5 or Midjourney images...) Official Examples
  2. Less VRAM consumption
    • I remove the VRAM-expensive modules in the official implementation.
    • The remaining model is much smaller than ControlNet Tile model and requires less VRAM.
    • When combined with Tiled Diffusion & VAE, you can do 4k image super-resolution with limited VRAM (e.g., < 12 GB).

      Please be aware that sdp may lead to OOM for some unknown reasons. You may use xformers instead.

  3. Wavelet Color Fix
    • The official StableSR will significantly change the color of the generated image. The problem will be even more prominent when upscaling in tiles.
    • I implement a powerful post-processing technique that effectively matches the color of the upscaled image to the original. See Wavelet Color Fix Example.

Usage

1. Installation

⚪ Method 1: Official Market

  • Open Automatic1111 WebUI -> Click Tab "Extensions" -> Click Tab "Available" -> Find "StableSR" -> Click "Install"

⚪ Method 2: URL Install

installation

2. Download the main components

  • You MUST use the Stable Diffusion V2.1 512 EMA checkpoint (~5.21GB) from StabilityAI

    • You can download it from HuggingFace
    • Put into stable-diffusion-webui/models/Stable-Diffusion/

    While it requires a SD2.1 checkpoint, you can still upscale ANY image (even from SD1.5 or NSFW). Your image won't be censored and the output quality won't be affected.

  • Download the extracted StableSR module

    • Official resources: HuggingFace (~1.2 G). Note that this is a zip file containing both the StableSR module and the VQVAE.
    • My resources: <GoogleDrive> <百度网盘-提取码aguq>
    • Put the StableSR module (~400MB) into your stable-diffusion-webui/extensions/sd-webui-stablesr/models/

3. Optional components

  • Install Tiled Diffusion & VAE extension
    • The original StableSR easily gets OOM for large images > 512.
    • For better quality and less VRAM usage, we recommend Tiled Diffusion & VAE.
  • Use the Official VQGAN VAE

4. Extension Usage

  • At the top of the WebUI, select the v2-1_512-ema-pruned checkpoint you downloaded.
  • Switch to img2img tag. Find the "Scripts" dropdown at the bottom of the page.
    • Select the StableSR script.
    • Click the refresh button and select the StableSR checkpoint you have downloaded.
    • Choose a scale factor.
  • Upload your image and start generation (can work without prompts).
  • Euler a sampler is recommended. CFG Scale<=2, Steps >= 20.
  • For output image size > 512, we recommend using Tiled Diffusion & VAE, otherwise, the image quality may not be ideal, and the VRAM usage will be huge.
  • Here are the official Tiled Diffusion settings:
    • Method = Mixture of Diffusers
    • Latent tile size = 64, Latent tile overlap = 32
    • Latent tile batch size as large as possible before Out of Memory.
    • Upscaler MUST be None (will not upscale here; instead, upscale in StableSR).
  • The following figure shows the recommended settings for 24GB VRAM.
    • For a 6GB device, just change Tiled Diffusion Latent tile batch size to 1, Tiled VAE Encoder Tile Size to 1024, Decoder Tile Size to 128.
    • SDP attention optimization may lead to OOM. Please use xformers in that case.
    • You DON'T need to change other settings in Tiled Diffusion & Tiled VAE unless you have a very deep understanding. These params are almost optimal for StableSR. recommended settings

5. Options Explained

  • What is "Pure Noise"?
    • Pure Noise refers to starting from a fully random noise tensor instead of your image. This is the default behavior in the StableSR paper.
    • When enabling it, the script ignores your denoising strength and gives you much more detailed images, but also changes the color & sharpness significantly
    • When disabling it, the script starts by adding some noise to your image. The result will be not fully detailed, even if you set denoising strength = 1 (but maybe aesthetically good). See Comparison.
    • If you disable Pure Noise, we recommend denoising strength=1
  • What is "Color Fix"?
    • This is to mitigate the color shift problem from StableSR and the tiling process.
    • AdaIN simply adjusts the color statistics between the original and the outcome images. This is the official algorithm but ineffective in many cases.
    • Wavelet decomposes the original and the outcome images into low and high frequency, and then replace the outcome image's low-frequency part (colors) with the original image's. This is very powerful for uneven color shifting. The algorithm is from GIMP and Krita, which will take several seconds for each image.
    • When enabling color fix, the original image will also show up in your preview window, but will NOT be saved automatically.

6. Important Notice

Why my results are different from the offical examples?

  • It is not your or our fault.
    • This extension has the same UNet model weights as the StableSR if installed correctly.
    • If you install the optional VQVAE, the whole model weights will be the same as the official model with fusion weights=0.
  • However, your result will be not as good as the official results, because:
    • Sampler Difference:
      • The official repo does 100 or 200 steps of legacy DDPM sampling with a custom timestep scheduler, and samples without negative prompts.
      • However, WebUI doesn't offer such a sampler, and it must sample with negative prompts. This is the main difference.
    • VQVAE Decoder Difference:
      • The official VQVAE Decoder takes some Encoder features as input.
      • However, in practice, I found these features are astonishingly huge for large images. (>10G for 4k images even in float16!)
      • Hence, I removed the CFW component in VAE Decoder. As this lead to inferior fidelity in details, I will try to add it back later as an option.

License

This project is licensed under:

CC BY-NC-SA 4.0

Disclaimer

  • All code in this extension is for research purposes only.
  • The commercial use of the code and checkpoint is strictly prohibited.

Important Notice for Outcome Images

  • Please note that the CC BY-NC-SA 4.0 license in the NVIDIA SPADE module also prohibits the commercial use of outcome images.
  • Jianyi Wang may change the SPADE module to a commercial-friendly one but he is busy.
  • If you wish to speed up his process for commercial purposes, please contact him through email: iceclearwjy@gmail.com

Acknowledgments

I would like to thank Jianyi Wang et al. for the original StableSR method.