|
--- |
|
license: cc-by-nc-4.0 |
|
library_name: diffusers |
|
tags: |
|
- text-to-image |
|
- stable-diffusion |
|
- diffusion distillation |
|
datasets: |
|
- OpenGVLab/ShareGPT-4o |
|
language: |
|
- ka |
|
metrics: |
|
- accuracy |
|
pipeline_tag: image-text-to-text |
|
--- |
|
|
|
# DMD2 Model Card |
|
|
|
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/63363b864067f020756275b7/YhssMfS_1e6q5fHKh9qrc.jpeg) |
|
|
|
> [**Improved Distribution Matching Distillation for Fast Image Synthesis**](https://arxiv.org/abs/2405.14867), |
|
> Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Frédo Durand, William T. Freeman |
|
|
|
## Contact |
|
|
|
Feel free to contact us if you have any questions about the paper! |
|
|
|
Tianwei Yin [tianweiy@mit.edu](mailto:tianweiy@mit.edu) |
|
|
|
## Usage |
|
|
|
We can use the standard diffuser pipeline: |
|
|
|
#### 4-step UNet generation |
|
|
|
```python |
|
import torch |
|
from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler |
|
from huggingface_hub import hf_hub_download |
|
from safetensors.torch import load_file |
|
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0" |
|
repo_name = "tianweiy/DMD2" |
|
ckpt_name = "dmd2_sdxl_4step_unet_fp16.bin" |
|
# Load model. |
|
unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16) |
|
unet.load_state_dict(torch.load(hf_hub_download(repo_name, ckpt_name), map_location="cuda")) |
|
pipe = DiffusionPipeline.from_pretrained(base_model_id, unet=unet, torch_dtype=torch.float16, variant="fp16").to("cuda") |
|
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) |
|
prompt="a photo of a cat" |
|
|
|
# LCMScheduler's default timesteps are different from the one we used for training |
|
image=pipe(prompt=prompt, num_inference_steps=4, guidance_scale=0, timesteps=[999, 749, 499, 249]).images[0] |
|
``` |
|
|
|
#### 4-step LoRA generation |
|
|
|
```python |
|
import torch |
|
from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler |
|
from huggingface_hub import hf_hub_download |
|
from safetensors.torch import load_file |
|
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0" |
|
repo_name = "tianweiy/DMD2" |
|
ckpt_name = "dmd2_sdxl_4step_lora_fp16.safetensors" |
|
# Load model. |
|
pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to("cuda") |
|
pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name)) |
|
pipe.fuse_lora(lora_scale=1.0) # we might want to make the scale smaller for community models |
|
|
|
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) |
|
prompt="a photo of a cat" |
|
|
|
# LCMScheduler's default timesteps are different from the one we used for training |
|
image=pipe(prompt=prompt, num_inference_steps=4, guidance_scale=0, timesteps=[999, 749, 499, 249]).images[0] |
|
``` |
|
|
|
#### 1-step UNet generation |
|
|
|
```python |
|
import torch |
|
from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler |
|
from huggingface_hub import hf_hub_download |
|
from safetensors.torch import load_file |
|
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0" |
|
repo_name = "tianweiy/DMD2" |
|
ckpt_name = "dmd2_sdxl_1step_unet_fp16.bin" |
|
# Load model. |
|
unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16) |
|
unet.load_state_dict(torch.load(hf_hub_download(repo_name, ckpt_name), map_location="cuda")) |
|
pipe = DiffusionPipeline.from_pretrained(base_model_id, unet=unet, torch_dtype=torch.float16, variant="fp16").to("cuda") |
|
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) |
|
prompt="a photo of a cat" |
|
image=pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0, timesteps=[399]).images[0] |
|
``` |
|
|
|
#### 4-step T2I Adapter |
|
|
|
```python |
|
from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, AutoencoderKL, UNet2DConditionModel, LCMScheduler |
|
from diffusers.utils import load_image, make_image_grid |
|
from controlnet_aux.canny import CannyDetector |
|
from huggingface_hub import hf_hub_download |
|
import torch |
|
|
|
# load adapter |
|
adapter = T2IAdapter.from_pretrained("TencentARC/t2i-adapter-canny-sdxl-1.0", torch_dtype=torch.float16, varient="fp16").to("cuda") |
|
|
|
vae=AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16) |
|
|
|
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0" |
|
repo_name = "tianweiy/DMD2" |
|
ckpt_name = "dmd2_sdxl_4step_unet_fp16.bin" |
|
# Load model. |
|
unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16) |
|
unet.load_state_dict(torch.load(hf_hub_download(repo_name, ckpt_name), map_location="cuda")) |
|
|
|
pipe = StableDiffusionXLAdapterPipeline.from_pretrained( |
|
base_model_id, unet=unet, vae=vae, adapter=adapter, torch_dtype=torch.float16, variant="fp16", |
|
).to("cuda") |
|
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) |
|
pipe.enable_xformers_memory_efficient_attention() |
|
|
|
canny_detector = CannyDetector() |
|
|
|
url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_canny.jpg" |
|
image = load_image(url) |
|
|
|
# Detect the canny map in low resolution to avoid high-frequency details |
|
image = canny_detector(image, detect_resolution=384, image_resolution=1024)#.resize((1024, 1024)) |
|
|
|
prompt = "Mystical fairy in real, magic, 4k picture, high quality" |
|
|
|
gen_images = pipe( |
|
prompt=prompt, |
|
image=image, |
|
num_inference_steps=4, |
|
guidance_scale=0, |
|
adapter_conditioning_scale=0.8, |
|
adapter_conditioning_factor=0.5, |
|
timesteps=[999, 749, 499, 249] |
|
).images[0] |
|
gen_images.save('out_canny.png') |
|
``` |
|
|
|
For more information, please refer to the [code repository](https://github.com/tianweiy/DMD2) |
|
|
|
|
|
## License |
|
|
|
Improved Distribution Matching Distillation is released under [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en). |
|
|
|
|
|
## Citation |
|
|
|
If you find DMD2 useful or relevant to your research, please kindly cite our papers: |
|
|
|
```bib |
|
@article{yin2024improved, |
|
title={Improved Distribution Matching Distillation for Fast Image Synthesis}, |
|
author={Yin, Tianwei and Gharbi, Micha{\"e}l and Park, Taesung and Zhang, Richard and Shechtman, Eli and Durand, Fredo and Freeman, William T}, |
|
journal={arXiv:2405.14867}, |
|
year={2024} |
|
} |
|
|
|
@inproceedings{yin2024onestep, |
|
title={One-step Diffusion with Distribution Matching Distillation}, |
|
author={Yin, Tianwei and Gharbi, Micha{\"e}l and Zhang, Richard and Shechtman, Eli and Durand, Fr{\'e}do and Freeman, William T and Park, Taesung}, |
|
booktitle={CVPR}, |
|
year={2024} |
|
} |
|
``` |
|
|
|
|
|
## Acknowledgments |
|
|
|
This work was done while Tianwei Yin was a full-time student at MIT. It was developed based on our reimplementation of the original DMD paper. This work was supported by the National Science Foundation under Cooperative Agreement PHY-2019786 (The NSF AI Institute for Artificial Intelligence and Fundamental Interactions, http://iaifi.org/), by NSF Grant 2105819, by NSF CISE award 1955864, and by funding from Google, GIST, Amazon, and Quanta Computer. |