Improve generation quality with FreeU
The UNet is responsible for denoising during the reverse diffusion process, and there are two distinct features in its architecture:
- Backbone features primarily contribute to the denoising process
- Skip features mainly introduce high-frequency features into the decoder module and can make the network overlook the semantics in the backbone features
However, the skip connection can sometimes introduce unnatural image details. FreeU is a technique for improving image quality by rebalancing the contributions from the UNet’s skip connections and backbone feature maps.
FreeU is applied during inference and it does not require any additional training. The technique works for different tasks such as text-to-image, image-to-image, and text-to-video.
In this guide, you will apply FreeU to the StableDiffusionPipeline, StableDiffusionXLPipeline, and TextToVideoSDPipeline. You need to install Diffusers from source to run the examples below.
StableDiffusionPipeline
Load the pipeline:
from diffusers import DiffusionPipeline
import torch
pipeline = DiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, safety_checker=None
).to("cuda")
Then enable the FreeU mechanism with the FreeU-specific hyperparameters. These values are scaling factors for the backbone and skip features.
pipeline.enable_freeu(s1=0.9, s2=0.2, b1=1.2, b2=1.4)
The values above are from the official FreeU code repository where you can also find reference hyperparameters for different models.
Disable the FreeU mechanism by calling disable_freeu()
on a pipeline.
And then run inference:
prompt = "A squirrel eating a burger"
seed = 2023
image = pipeline(prompt, generator=torch.manual_seed(seed)).images[0]
image
The figure below compares non-FreeU and FreeU results respectively for the same hyperparameters used above (prompt
and seed
):
Let’s see how Stable Diffusion 2 results are impacted:
from diffusers import DiffusionPipeline
import torch
pipeline = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16, safety_checker=None
).to("cuda")
prompt = "A squirrel eating a burger"
seed = 2023
pipeline.enable_freeu(s1=0.9, s2=0.2, b1=1.1, b2=1.2)
image = pipeline(prompt, generator=torch.manual_seed(seed)).images[0]
image
Stable Diffusion XL
Finally, let’s take a look at how FreeU affects Stable Diffusion XL results:
from diffusers import DiffusionPipeline
import torch
pipeline = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16,
).to("cuda")
prompt = "A squirrel eating a burger"
seed = 2023
# Comes from
# https://wandb.ai/nasirk24/UNET-FreeU-SDXL/reports/FreeU-SDXL-Optimal-Parameters--Vmlldzo1NDg4NTUw
pipeline.enable_freeu(s1=0.6, s2=0.4, b1=1.1, b2=1.2)
image = pipeline(prompt, generator=torch.manual_seed(seed)).images[0]
image
Text-to-video generation
FreeU can also be used to improve video quality:
from diffusers import DiffusionPipeline
from diffusers.utils import export_to_video
import torch
model_id = "cerspense/zeroscope_v2_576w"
pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
prompt = "an astronaut riding a horse on mars"
seed = 2023
# The values come from
# https://github.com/lyn-rgb/FreeU_Diffusers#video-pipelines
pipe.enable_freeu(b1=1.2, b2=1.4, s1=0.9, s2=0.2)
video_frames = pipe(prompt, height=320, width=576, num_frames=30, generator=torch.manual_seed(seed)).frames
export_to_video(video_frames, "astronaut_rides_horse.mp4")
Thanks to kadirnar for helping to integrate the feature, and to justindujardin for the helpful discussions.