t2iadapter_depth_sd15v2 / README.md

williamberman

update modelcard

6068a4c 12 months ago

preview code

raw

history blame

No virus

7.84 kB

	---
	license: apache-2.0
	base_model: runwayml/stable-diffusion-v1-5
	tags:
	- art
	- t2i-adapter
	- controlnet
	- stable-diffusion
	- image-to-image
	---

	# T2I Adapter - Depth

	T2I Adapter is a network providing additional conditioning to stable diffusion. Each t2i checkpoint takes a different type of conditioning as input and is used with a specific base stable diffusion checkpoint.

	This checkpoint provides conditioning on depth for the stable diffusion 1.5 checkpoint.

	## Model Details
	- Developed by: T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models
	- Model type: Diffusion-based text-to-image generation model
	- Language(s): English
	- License: Apache 2.0
	- Resources for more information: [GitHub Repository](https://github.com/TencentARC/T2I-Adapter), [Paper](https://arxiv.org/abs/2302.08453).
	- Cite as:

	@misc{
	title={T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models},
	author={Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, Ying Shan, Xiaohu Qie},
	year={2023},
	eprint={2302.08453},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}

	### Checkpoints

	\| Model Name \| Control Image Overview\| Control Image Example \| Generated Image Example \|
	\|---\|---\|---\|---\|
	\|[TencentARC/t2iadapter_color_sd14v1](https://huggingface.co/TencentARC/t2iadapter_color_sd14v1)<br/> Trained with spatial color palette \| A image with 8x8 color palette.\|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/color_sample_input.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/color_sample_input.png"/></a>\|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/color_sample_output.png"><img width="64" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/color_sample_output.png"/></a>\|
	\|[TencentARC/t2iadapter_canny_sd14v1](https://huggingface.co/TencentARC/t2iadapter_canny_sd14v1)<br/> Trained with canny edge detection \| A monochrome image with white edges on a black background.\|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/canny_sample_input.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/canny_sample_input.png"/></a>\|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/canny_sample_output.png"><img width="64" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/canny_sample_output.png"/></a>\|
	\|[TencentARC/t2iadapter_sketch_sd14v1](https://huggingface.co/TencentARC/t2iadapter_sketch_sd14v1)<br/> Trained with [PidiNet](https://github.com/zhuoinoulu/pidinet) edge detection \| A hand-drawn monochrome image with white outlines on a black background.\|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/sketch_sample_input.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/sketch_sample_input.png"/></a>\|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/sketch_sample_output.png"><img width="64" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/sketch_sample_output.png"/></a>\|
	\|[TencentARC/t2iadapter_depth_sd14v1](https://huggingface.co/TencentARC/t2iadapter_depth_sd14v1)<br/> Trained with Midas depth estimation \| A grayscale image with black representing deep areas and white representing shallow areas.\|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/depth_sample_input.png"><img width="64" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/depth_sample_input.png"/></a>\|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/depth_sample_output.png"><img width="64" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/depth_sample_output.png"/></a>\|
	\|[TencentARC/t2iadapter_openpose_sd14v1](https://huggingface.co/TencentARC/t2iadapter_openpose_sd14v1)<br/> Trained with OpenPose bone image \| A [OpenPose bone](https://github.com/CMU-Perceptual-Computing-Lab/openpose) image.\|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/openpose_sample_input.png"><img width="64" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/openpose_sample_input.png"/></a>\|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/openpose_sample_output.png"><img width="64" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/openpose_sample_output.png"/></a>\|
	\|[TencentARC/t2iadapter_keypose_sd14v1](https://huggingface.co/TencentARC/t2iadapter_keypose_sd14v1)<br/> Trained with mmpose skeleton image \| A [mmpose skeleton](https://github.com/open-mmlab/mmpose) image.\|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/keypose_sample_input.png"><img width="64" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/keypose_sample_input.png"/></a>\|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/keypose_sample_output.png"><img width="64" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/keypose_sample_output.png"/></a>\|
	\|[TencentARC/t2iadapter_seg_sd14v1](https://huggingface.co/TencentARC/t2iadapter_seg_sd14v1)<br/>Trained with semantic segmentation \| An [custom](https://github.com/TencentARC/T2I-Adapter/discussions/25) segmentation protocol image.\|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/seg_sample_input.png"><img width="64" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/seg_sample_input.png"/></a>\|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/seg_sample_output.png"><img width="64" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/seg_sample_output.png"/></a> \|
	\|[TencentARC/t2iadapter_canny_sd15v2](https://huggingface.co/TencentARC/t2iadapter_canny_sd15v2)\|\|
	\|[TencentARC/t2iadapter_depth_sd15v2](https://huggingface.co/TencentARC/t2iadapter_depth_sd15v2)\|\|
	\|[TencentARC/t2iadapter_sketch_sd15v2](https://huggingface.co/TencentARC/t2iadapter_sketch_sd15v2)\|\|
	\|[TencentARC/t2iadapter_zoedepth_sd15v1](https://huggingface.co/TencentARC/t2iadapter_zoedepth_sd15v1)\|\|

	## Example

	1. Dependencies

	```sh
	pip install diffusers transformers controlnet_aux
	```

	2. Run code:

	```python
	from controlnet_aux import MidasDetector
	from PIL import Image
	from diffusers import T2IAdapter, StableDiffusionAdapterPipeline
	import torch

	midas = MidasDetector.from_pretrained("lllyasviel/Annotators")

	image = Image.open('./images/depth_input.png')

	image = midas(image)

	image.save('./images/depth.png')

	adapter = T2IAdapter.from_pretrained("TencentARC/t2iadapter_depth_sd15v2", torch_dtype=torch.float16)
	pipe = StableDiffusionAdapterPipeline.from_pretrained(
	"runwayml/stable-diffusion-v1-5", adapter=adapter, safety_checker=None, torch_dtype=torch.float16, variant="fp16"
	)

	pipe.to('cuda')

	generator = torch.Generator().manual_seed(1)

	depth_out = pipe(prompt="storm trooper giving a speech", image=image, generator=generator).images[0]

	depth_out.save('./images/depth_output.png')
	```

	![depth_input](./images/depth_input.png)
	![depth](./images/depth.png)
	![depth_output](./images/depth_output.png)