metadata

license: mit

CleanDIFT Model Card

Diffusion models learn powerful world representations that have proven valuable for tasks like semantic correspondence detection, depth estimation, semantic segmentation, and classification. However, diffusion models require noisy input images, which destroys information and introduces the noise level as a hyperparameter that needs to be tuned for each task.

We introduce CleanDIFT, a novel method to extract noise-free, timestep-independent features by enabling diffusion models to work directly with clean input images. The approach is efficient, training on a single GPU in just 30 minutes. We publish these models alongside our paper "CleanDIFT: Diffusion Features without Noise".

We provide checkpoints for Stable Diffusion 1.5 and Stable Diffusion 2.1.

Usage

For detailed examples on how to extract features with CleanDIFT and how to use them for downstream tasks, please refer to the notebooks provided here.

Our checkpoints are fully compatible with the diffusers library. If you already have a pipeline using SD 1.5 or SD 2.1 from diffusers, you can simply replace the U-Net state dict:

from diffusers import UNet2DConditionModel
from huggingface_hub import hf_hub_download

unet = UNet2DConditionModel.from_pretrained("stabilityai/stable-diffusion-2-1", subfolder="unet")
ckpt_pth = hf_hub_download(repo_id="CompVis/cleandift", filename="cleandift_sd21_unet.safetensors")
state_dict = load_file(ckpt_pth)
unet.load_state_dict(state_dict, strict=True)