shuai1106 commited on
Commit
99b9db3
1 Parent(s): 4df7ba5

Upload 7 files

Browse files
README.md ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: openrail
3
+ base_model: runwayml/stable-diffusion-v1-5
4
+ tags:
5
+ - art
6
+ - controlnet
7
+ - stable-diffusion
8
+ - controlnet-v1-1
9
+ - image-to-image
10
+ duplicated_from: ControlNet-1-1-preview/control_v11p_sd15_lineart
11
+ ---
12
+
13
+ # Controlnet - v1.1 - *lineart Version*
14
+
15
+ **Controlnet v1.1** was released in [lllyasviel/ControlNet-v1-1](https://huggingface.co/lllyasviel/ControlNet-v1-1) by [Lvmin Zhang](https://huggingface.co/lllyasviel).
16
+
17
+ This checkpoint is a conversion of [the original checkpoint](https://huggingface.co/lllyasviel/ControlNet-v1-1/blob/main/control_v11p_sd15_lineart.pth) into `diffusers` format.
18
+ It can be used in combination with **Stable Diffusion**, such as [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5).
19
+
20
+
21
+ For more details, please also have a look at the [🧨 Diffusers docs](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/controlnet).
22
+
23
+
24
+ ControlNet is a neural network structure to control diffusion models by adding extra conditions.
25
+
26
+ ![img](./sd.png)
27
+
28
+ This checkpoint corresponds to the ControlNet conditioned on **lineart images**.
29
+
30
+ ## Model Details
31
+ - **Developed by:** Lvmin Zhang, Maneesh Agrawala
32
+ - **Model type:** Diffusion-based text-to-image generation model
33
+ - **Language(s):** English
34
+ - **License:** [The CreativeML OpenRAIL M license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) is an [Open RAIL M license](https://www.licenses.ai/blog/2022/8/18/naming-convention-of-responsible-ai-licenses), adapted from the work that [BigScience](https://bigscience.huggingface.co/) and [the RAIL Initiative](https://www.licenses.ai/) are jointly carrying in the area of responsible AI licensing. See also [the article about the BLOOM Open RAIL license](https://bigscience.huggingface.co/blog/the-bigscience-rail-license) on which our license is based.
35
+ - **Resources for more information:** [GitHub Repository](https://github.com/lllyasviel/ControlNet), [Paper](https://arxiv.org/abs/2302.05543).
36
+ - **Cite as:**
37
+
38
+ @misc{zhang2023adding,
39
+ title={Adding Conditional Control to Text-to-Image Diffusion Models},
40
+ author={Lvmin Zhang and Maneesh Agrawala},
41
+ year={2023},
42
+ eprint={2302.05543},
43
+ archivePrefix={arXiv},
44
+ primaryClass={cs.CV}
45
+ }
46
+
47
+ ## Introduction
48
+
49
+ Controlnet was proposed in [*Adding Conditional Control to Text-to-Image Diffusion Models*](https://arxiv.org/abs/2302.05543) by
50
+ Lvmin Zhang, Maneesh Agrawala.
51
+
52
+ The abstract reads as follows:
53
+
54
+ *We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions.
55
+ The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k).
56
+ Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices.
57
+ Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data.
58
+ We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc.
59
+ This may enrich the methods to control large diffusion models and further facilitate related applications.*
60
+
61
+ ## Example
62
+
63
+ It is recommended to use the checkpoint with [Stable Diffusion v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) as the checkpoint
64
+ has been trained on it.
65
+ Experimentally, the checkpoint can be used with other diffusion models such as dreamboothed stable diffusion.
66
+
67
+ **Note**: If you want to process an image to create the auxiliary conditioning, external dependencies are required as shown below:
68
+
69
+ 1. Install https://github.com/patrickvonplaten/controlnet_aux
70
+
71
+ ```sh
72
+ $ pip install controlnet_aux==0.3.0
73
+ ```
74
+
75
+ 2. Let's install `diffusers` and related packages:
76
+
77
+ ```
78
+ $ pip install diffusers transformers accelerate
79
+ ```
80
+
81
+ 3. Run code:
82
+
83
+ ```python
84
+ import torch
85
+ import os
86
+ from huggingface_hub import HfApi
87
+ from pathlib import Path
88
+ from diffusers.utils import load_image
89
+ from PIL import Image
90
+ import numpy as np
91
+ from controlnet_aux import LineartDetector
92
+
93
+ from diffusers import (
94
+ ControlNetModel,
95
+ StableDiffusionControlNetPipeline,
96
+ UniPCMultistepScheduler,
97
+ )
98
+
99
+ checkpoint = "ControlNet-1-1-preview/control_v11p_sd15_lineart"
100
+
101
+ image = load_image(
102
+ "https://huggingface.co/ControlNet-1-1-preview/control_v11p_sd15_lineart/resolve/main/images/input.png"
103
+ )
104
+ image = image.resize((512, 512))
105
+
106
+ prompt = "michael jackson concert"
107
+ processor = LineartDetector.from_pretrained("lllyasviel/Annotators")
108
+
109
+ control_image = processor(image)
110
+ control_image.save("./images/control.png")
111
+
112
+ controlnet = ControlNetModel.from_pretrained(checkpoint, torch_dtype=torch.float16)
113
+ pipe = StableDiffusionControlNetPipeline.from_pretrained(
114
+ "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
115
+ )
116
+
117
+ pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
118
+ pipe.enable_model_cpu_offload()
119
+
120
+ generator = torch.manual_seed(0)
121
+ image = pipe(prompt, num_inference_steps=30, generator=generator, image=control_image).images[0]
122
+
123
+ image.save('images/image_out.png')
124
+
125
+ ```
126
+
127
+ ![bird](./images/input.png)
128
+
129
+ ![bird_canny](./images/control.png)
130
+
131
+ ![bird_canny_out](./images/image_out.png)
132
+
133
+ ## Other released checkpoints v1-1
134
+
135
+ The authors released 14 different checkpoints, each trained with [Stable Diffusion v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5)
136
+ on a different type of conditioning:
137
+
138
+ | Model Name | Control Image Overview| Condition Image | Control Image Example | Generated Image Example |
139
+ |---|---|---|---|---|
140
+ |[lllyasviel/control_v11p_sd15_canny](https://huggingface.co/lllyasviel/control_v11p_sd15_canny)<br/> | *Trained with canny edge detection* | A monochrome image with white edges on a black background.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/image_out.png"/></a>|
141
+ |[lllyasviel/control_v11e_sd15_ip2p](https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p)<br/> | *Trained with pixel to pixel instruction* | No condition .|<a href="https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p/resolve/main/images/image_out.png"/></a>|
142
+ |[lllyasviel/control_v11p_sd15_inpaint](https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint)<br/> | Trained with image inpainting | No condition.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint/resolve/main/images/output.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint/resolve/main/images/output.png"/></a>|
143
+ |[lllyasviel/control_v11p_sd15_mlsd](https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd)<br/> | Trained with multi-level line segment detection | An image with annotated line segments.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd/resolve/main/images/image_out.png"/></a>|
144
+ |[lllyasviel/control_v11f1p_sd15_depth](https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth)<br/> | Trained with depth estimation | An image with depth information, usually represented as a grayscale image.|<a href="https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/image_out.png"/></a>|
145
+ |[lllyasviel/control_v11p_sd15_normalbae](https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae)<br/> | Trained with surface normal estimation | An image with surface normal information, usually represented as a color-coded image.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae/resolve/main/images/image_out.png"/></a>|
146
+ |[lllyasviel/control_v11p_sd15_seg](https://huggingface.co/lllyasviel/control_v11p_sd15_seg)<br/> | Trained with image segmentation | An image with segmented regions, usually represented as a color-coded image.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_seg/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_seg/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_seg/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_seg/resolve/main/images/image_out.png"/></a>|
147
+ |[lllyasviel/control_v11p_sd15_lineart](https://huggingface.co/lllyasviel/control_v11p_sd15_lineart)<br/> | Trained with line art generation | An image with line art, usually black lines on a white background.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_lineart/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_lineart/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_lineart/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_lineart/resolve/main/images/image_out.png"/></a>|
148
+ |[lllyasviel/control_v11p_sd15s2_lineart_anime](https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime)<br/> | Trained with anime line art generation | An image with anime-style line art.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime/resolve/main/images/image_out.png"/></a>|
149
+ |[lllyasviel/control_v11p_sd15_openpose](https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime)<br/> | Trained with human pose estimation | An image with human poses, usually represented as a set of keypoints or skeletons.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/image_out.png"/></a>|
150
+ |[lllyasviel/control_v11p_sd15_scribble](https://huggingface.co/lllyasviel/control_v11p_sd15_scribble)<br/> | Trained with scribble-based image generation | An image with scribbles, usually random or user-drawn strokes.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_scribble/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_scribble/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_scribble/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_scribble/resolve/main/images/image_out.png"/></a>|
151
+ |[lllyasviel/control_v11p_sd15_softedge](https://huggingface.co/lllyasviel/control_v11p_sd15_softedge)<br/> | Trained with soft edge image generation | An image with soft edges, usually to create a more painterly or artistic effect.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_softedge/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_softedge/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_softedge/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_softedge/resolve/main/images/image_out.png"/></a>|
152
+ |[lllyasviel/control_v11e_sd15_shuffle](https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle)<br/> | Trained with image shuffling | An image with shuffled patches or regions.|<a href="https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle/resolve/main/images/image_out.png"/></a>|
153
+ |[lllyasviel/control_v11f1e_sd15_tile](https://huggingface.co/lllyasviel/control_v11f1e_sd15_tile)<br/> | Trained with image tiling | A blurry image or part of an image .|<a href="https://huggingface.co/lllyasviel/control_v11f1e_sd15_tile/resolve/main/images/original.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11f1e_sd15_tile/resolve/main/images/original.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11f1e_sd15_tile/resolve/main/images/output.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11f1e_sd15_tile/resolve/main/images/output.png"/></a>|
154
+
155
+ ## More information
156
+
157
+ For more information, please also have a look at the [Diffusers ControlNet Blog Post](https://huggingface.co/blog/controlnet) and have a look at the [official docs](https://github.com/lllyasviel/ControlNet-v1-1-nightly).
config.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "ControlNetModel",
3
+ "_diffusers_version": "0.16.0.dev0",
4
+ "_name_or_path": "/home/patrick/controlnet_v1_1/control_v11p_sd15_lineart",
5
+ "act_fn": "silu",
6
+ "attention_head_dim": 8,
7
+ "block_out_channels": [
8
+ 320,
9
+ 640,
10
+ 1280,
11
+ 1280
12
+ ],
13
+ "class_embed_type": null,
14
+ "conditioning_embedding_out_channels": [
15
+ 16,
16
+ 32,
17
+ 96,
18
+ 256
19
+ ],
20
+ "controlnet_conditioning_channel_order": "rgb",
21
+ "cross_attention_dim": 768,
22
+ "down_block_types": [
23
+ "CrossAttnDownBlock2D",
24
+ "CrossAttnDownBlock2D",
25
+ "CrossAttnDownBlock2D",
26
+ "DownBlock2D"
27
+ ],
28
+ "downsample_padding": 1,
29
+ "flip_sin_to_cos": true,
30
+ "freq_shift": 0,
31
+ "in_channels": 4,
32
+ "layers_per_block": 2,
33
+ "mid_block_scale_factor": 1,
34
+ "norm_eps": 1e-05,
35
+ "norm_num_groups": 32,
36
+ "num_class_embeds": null,
37
+ "only_cross_attention": false,
38
+ "projection_class_embeddings_input_dim": null,
39
+ "resnet_time_scale_shift": "default",
40
+ "upcast_attention": false,
41
+ "use_linear_projection": false
42
+ }
control_net_lineart.py ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ import torch
3
+ import os
4
+ from huggingface_hub import HfApi
5
+ from pathlib import Path
6
+ from diffusers.utils import load_image
7
+ from controlnet_aux import LineartDetector
8
+
9
+ from diffusers import (
10
+ ControlNetModel,
11
+ StableDiffusionControlNetPipeline,
12
+ UniPCMultistepScheduler,
13
+ )
14
+ import sys
15
+
16
+ checkpoint = sys.argv[1]
17
+
18
+ url = "https://github.com/lllyasviel/ControlNet-v1-1-nightly/raw/main/test_imgs/bag.png"
19
+ url = "https://github.com/lllyasviel/ControlNet-v1-1-nightly/raw/main/test_imgs/person_1.jpeg"
20
+ image = load_image(url)
21
+
22
+ prompt = "michael jackson concert"
23
+
24
+ processor = LineartDetector.from_pretrained("lllyasviel/Annotators")
25
+ image = processor(image)
26
+ image.save("/home/patrick/images/check.png")
27
+
28
+ controlnet = ControlNetModel.from_pretrained(checkpoint, torch_dtype=torch.float16)
29
+ pipe = StableDiffusionControlNetPipeline.from_pretrained(
30
+ "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
31
+ )
32
+
33
+ pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
34
+ pipe.enable_model_cpu_offload()
35
+
36
+ generator = torch.manual_seed(0)
37
+ out_image = pipe(prompt, num_inference_steps=30, generator=generator, image=image).images[0]
38
+
39
+ path = os.path.join(Path.home(), "images", "aa.png")
40
+ out_image.save(path)
41
+
42
+ api = HfApi()
43
+
44
+ api.upload_file(
45
+ path_or_fileobj=path,
46
+ path_in_repo=path.split("/")[-1],
47
+ repo_id="patrickvonplaten/images",
48
+ repo_type="dataset",
49
+ )
50
+ print("https://huggingface.co/datasets/patrickvonplaten/images/blob/main/aa.png")
diffusion_pytorch_model.fp16.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4aaffceded3f8568279387ffeca14144795550fab30caac46ef231f3640f47bd
3
+ size 722698343
images/control.png ADDED
images/image_out.png ADDED
images/input.png ADDED