DMD2 / README.md

Update README.md

d70dad9 verified 5 months ago

6.87 kB

	---
	license: cc-by-nc-4.0
	library_name: diffusers
	tags:
	- text-to-image
	- stable-diffusion
	- diffusion distillation
	datasets:
	- OpenGVLab/ShareGPT-4o
	language:
	- ka
	metrics:
	- accuracy
	pipeline_tag: image-text-to-text
	---

	# DMD2 Model Card

	![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/63363b864067f020756275b7/YhssMfS_1e6q5fHKh9qrc.jpeg)

	> [Improved Distribution Matching Distillation for Fast Image Synthesis](https://arxiv.org/abs/2405.14867),
	> Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Frédo Durand, William T. Freeman

	## Contact

	Feel free to contact us if you have any questions about the paper!

	Tianwei Yin [tianweiy@mit.edu](mailto:tianweiy@mit.edu)

	## Usage

	We can use the standard diffuser pipeline:

	#### 4-step UNet generation

	```python
	import torch
	from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler
	from huggingface_hub import hf_hub_download
	from safetensors.torch import load_file
	base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
	repo_name = "tianweiy/DMD2"
	ckpt_name = "dmd2_sdxl_4step_unet_fp16.bin"
	# Load model.
	unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
	unet.load_state_dict(torch.load(hf_hub_download(repo_name, ckpt_name), map_location="cuda"))
	pipe = DiffusionPipeline.from_pretrained(base_model_id, unet=unet, torch_dtype=torch.float16, variant="fp16").to("cuda")
	pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
	prompt="a photo of a cat"

	# LCMScheduler's default timesteps are different from the one we used for training
	image=pipe(prompt=prompt, num_inference_steps=4, guidance_scale=0, timesteps=[999, 749, 499, 249]).images[0]
	```

	#### 4-step LoRA generation

	```python
	import torch
	from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler
	from huggingface_hub import hf_hub_download
	from safetensors.torch import load_file
	base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
	repo_name = "tianweiy/DMD2"
	ckpt_name = "dmd2_sdxl_4step_lora_fp16.safetensors"
	# Load model.
	pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to("cuda")
	pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
	pipe.fuse_lora(lora_scale=1.0) # we might want to make the scale smaller for community models

	pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
	prompt="a photo of a cat"

	# LCMScheduler's default timesteps are different from the one we used for training
	image=pipe(prompt=prompt, num_inference_steps=4, guidance_scale=0, timesteps=[999, 749, 499, 249]).images[0]
	```

	#### 1-step UNet generation

	```python
	import torch
	from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler
	from huggingface_hub import hf_hub_download
	from safetensors.torch import load_file
	base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
	repo_name = "tianweiy/DMD2"
	ckpt_name = "dmd2_sdxl_1step_unet_fp16.bin"
	# Load model.
	unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
	unet.load_state_dict(torch.load(hf_hub_download(repo_name, ckpt_name), map_location="cuda"))
	pipe = DiffusionPipeline.from_pretrained(base_model_id, unet=unet, torch_dtype=torch.float16, variant="fp16").to("cuda")
	pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
	prompt="a photo of a cat"
	image=pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0, timesteps=[399]).images[0]
	```

	#### 4-step T2I Adapter

	```python
	from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, AutoencoderKL, UNet2DConditionModel, LCMScheduler
	from diffusers.utils import load_image, make_image_grid
	from controlnet_aux.canny import CannyDetector
	from huggingface_hub import hf_hub_download
	import torch

	# load adapter
	adapter = T2IAdapter.from_pretrained("TencentARC/t2i-adapter-canny-sdxl-1.0", torch_dtype=torch.float16, varient="fp16").to("cuda")

	vae=AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)

	base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
	repo_name = "tianweiy/DMD2"
	ckpt_name = "dmd2_sdxl_4step_unet_fp16.bin"
	# Load model.
	unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
	unet.load_state_dict(torch.load(hf_hub_download(repo_name, ckpt_name), map_location="cuda"))

	pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
	base_model_id, unet=unet, vae=vae, adapter=adapter, torch_dtype=torch.float16, variant="fp16",
	).to("cuda")
	pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
	pipe.enable_xformers_memory_efficient_attention()

	canny_detector = CannyDetector()

	url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_canny.jpg"
	image = load_image(url)

	# Detect the canny map in low resolution to avoid high-frequency details
	image = canny_detector(image, detect_resolution=384, image_resolution=1024)#.resize((1024, 1024))

	prompt = "Mystical fairy in real, magic, 4k picture, high quality"

	gen_images = pipe(
	prompt=prompt,
	image=image,
	num_inference_steps=4,
	guidance_scale=0,
	adapter_conditioning_scale=0.8,
	adapter_conditioning_factor=0.5,
	timesteps=[999, 749, 499, 249]
	).images[0]
	gen_images.save('out_canny.png')
	```

	For more information, please refer to the [code repository](https://github.com/tianweiy/DMD2)


	## License

	Improved Distribution Matching Distillation is released under [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en).


	## Citation

	If you find DMD2 useful or relevant to your research, please kindly cite our papers:

	```bib
	@article{yin2024improved,
	title={Improved Distribution Matching Distillation for Fast Image Synthesis},
	author={Yin, Tianwei and Gharbi, Micha{\"e}l and Park, Taesung and Zhang, Richard and Shechtman, Eli and Durand, Fredo and Freeman, William T},
	journal={arXiv:2405.14867},
	year={2024}
	}

	@inproceedings{yin2024onestep,
	title={One-step Diffusion with Distribution Matching Distillation},
	author={Yin, Tianwei and Gharbi, Micha{\"e}l and Zhang, Richard and Shechtman, Eli and Durand, Fr{\'e}do and Freeman, William T and Park, Taesung},
	booktitle={CVPR},
	year={2024}
	}
	```


	## Acknowledgments

	This work was done while Tianwei Yin was a full-time student at MIT. It was developed based on our reimplementation of the original DMD paper. This work was supported by the National Science Foundation under Cooperative Agreement PHY-2019786 (The NSF AI Institute for Artificial Intelligence and Fundamental Interactions, http://iaifi.org/), by NSF Grant 2105819, by NSF CISE award 1955864, and by funding from Google, GIST, Amazon, and Quanta Computer.