patrickvonplaten commited on
Commit
86a12be
1 Parent(s): 673a573
Files changed (1) hide show
  1. README.md +19 -37
README.md CHANGED
@@ -7,12 +7,12 @@ tags:
7
  - stable-diffusion
8
  ---
9
 
10
- # Controlnet - v1.1 - *Canny Version*
11
 
12
- **Controlnet v1.1** is the successor model of [Controlnet v1.0](https://huggingface.co/lllyasviel/sd-controlnet-canny)
13
  and was released in [lllyasviel/ControlNet-v1-1](https://huggingface.co/lllyasviel/ControlNet-v1-1) by [Lvmin Zhang](https://huggingface.co/lllyasviel).
14
 
15
- This checkpoint is a conversion of [the original checkpoint](https://huggingface.co/lllyasviel/ControlNet-v1-1/blob/main/control_v11p_sd15_canny.pth) into `diffusers` format.
16
  It can be used in combination with **Stable Diffusion**, such as [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5).
17
 
18
 
@@ -23,7 +23,7 @@ ControlNet is a neural network structure to control diffusion models by adding e
23
 
24
  ![img](./sd.png)
25
 
26
- This checkpoint corresponds to the ControlNet conditioned on **Canny edges**.
27
 
28
  ## Model Details
29
  - **Developed by:** Lvmin Zhang, Maneesh Agrawala
@@ -64,10 +64,10 @@ Experimentally, the checkpoint can be used with other diffusion models such as d
64
 
65
  **Note**: If you want to process an image to create the auxiliary conditioning, external dependencies are required as shown below:
66
 
67
- 1. Install [opencv](https://opencv.org/):
68
 
69
  ```sh
70
- $ pip install opencv-contrib-python
71
  ```
72
 
73
  2. Let's install `diffusers` and related packages:
@@ -84,9 +84,9 @@ import os
84
  from huggingface_hub import HfApi
85
  from pathlib import Path
86
  from diffusers.utils import load_image
87
- import numpy as np
88
- import cv2
89
  from PIL import Image
 
 
90
 
91
  from diffusers import (
92
  ControlNetModel,
@@ -94,22 +94,16 @@ from diffusers import (
94
  UniPCMultistepScheduler,
95
  )
96
 
97
- checkpoint = "ControlNet-1-1-preview/control_v11p_sd15_canny"
98
 
99
  image = load_image(
100
- "https://huggingface.co/ControlNet-1-1-preview/control_v11p_sd15_canny/resolve/main/images/input.png"
101
  )
102
 
103
- image = np.array(image)
104
-
105
- low_threshold = 100
106
- high_threshold = 200
107
-
108
- image = cv2.Canny(image, low_threshold, high_threshold)
109
- image = image[:, :, None]
110
- image = np.concatenate([image, image, image], axis=2)
111
- control_image = Image.fromarray(image)
112
 
 
113
  control_image.save("./images/control.png")
114
 
115
  controlnet = ControlNetModel.from_pretrained(checkpoint, torch_dtype=torch.float16)
@@ -120,10 +114,11 @@ pipe = StableDiffusionControlNetPipeline.from_pretrained(
120
  pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
121
  pipe.enable_model_cpu_offload()
122
 
123
- generator = torch.manual_seed(33)
124
- image = pipe("a blue paradise bird in the jungle", num_inference_steps=20, generator=generator, image=control_image).images[0]
125
 
126
  image.save('images/image_out.png')
 
127
  ```
128
 
129
  ![bird](./images/input.png)
@@ -139,25 +134,12 @@ on a different type of conditioning:
139
 
140
  | Model Name | Control Image Overview| Control Image Example | Generated Image Example |
141
  |---|---|---|---|
142
- |[lllyasviel/control_v11p_sd15_canny](https://huggingface.co/lllyasviel/control_v11p_sd15_canny)<br/> *Trained with canny edge detection* | A monochrome image with white edges on a black background.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_bird_canny.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_bird_canny.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_bird_canny_1.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_bird_canny_1.png"/></a>|
143
- |[lllyasviel/control_v11p_sd15_mlsd](https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd)<br/> *Trained with Midas depth estimation* |A grayscale image with black representing deep areas and white representing shallow areas.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_vermeer_depth.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_vermeer_depth.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_vermeer_depth_2.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_vermeer_depth_2.png"/></a>|
144
- |[lllyasviel/control_v11p_sd15_depth](https://huggingface.co/lllyasviel/control_v11p_sd15_depth)<br/> *Trained with HED edge detection (soft edge)* |A monochrome image with white soft edges on a black background.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_bird_hed.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_bird_hed.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_bird_hed_1.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_bird_hed_1.png"/></a> |
145
- |[lllyasviel/control_v11p_sd15_normalbae](https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae)<br/> *Trained with M-LSD line detection* |A monochrome image composed only of white straight lines on a black background.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_room_mlsd.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_room_mlsd.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_mlsd_0.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_mlsd_0.png"/></a>|
146
- |[lllyasviel/control_v11p_sd15_inpaint](https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint)<br/> *Trained with normal map* |A [normal mapped](https://en.wikipedia.org/wiki/Normal_mapping) image.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_human_normal.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_human_normal.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_human_normal_1.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_human_normal_1.png"/></a>|
147
- |[lllyasviel/control_v11p_sd15_lineart](https://huggingface.co/lllyasviel/control_v11p_sd15_lineart)<br/> *Trained with OpenPose bone image* |A [OpenPose bone](https://github.com/CMU-Perceptual-Computing-Lab/openpose) image.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_human_openpose.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_human_openpose.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_human_openpose_0.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_human_openpose_0.png"/></a>|
148
- |[lllyasviel/control_v11p_sd15s2_lineart_anime](https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime)<br/> *Trained with human scribbles* |A hand-drawn monochrome image with white outlines on a black background.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_vermeer_scribble.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_vermeer_scribble.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_vermeer_scribble_0.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_vermeer_scribble_0.png"/></a> |
149
- |[lllyasviel/control_v11p_sd15_openpose](https://huggingface.co/lllyasviel/control_v11p_sd15_openpose)<br/>*Trained with semantic segmentation* |An [ADE20K](https://groups.csail.mit.edu/vision/datasets/ADE20K/)'s segmentation protocol image.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_room_seg.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_room_seg.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_seg_1.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_seg_1.png"/></a> |
150
- |[lllyasviel/control_v11p_sd15_scribble](https://huggingface.co/lllyasviel/control_v11p_sd15_scribble)<br/>*Trained with semantic segmentation* |An [ADE20K](https://groups.csail.mit.edu/vision/datasets/ADE20K/)'s segmentation protocol image.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_room_seg.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_room_seg.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_seg_1.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_seg_1.png"/></a> |
151
- |[lllyasviel/control_v11p_sd15_softedge](https://huggingface.co/lllyasviel/control_v11p_sd15_softedge)<br/>*Trained with semantic segmentation* |An [ADE20K](https://groups.csail.mit.edu/vision/datasets/ADE20K/)'s segmentation protocol image.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_room_seg.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_room_seg.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_seg_1.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_seg_1.png"/></a> |
152
- |[lllyasviel/control_v11e_sd15_shuffle](https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle)<br/>*Trained with semantic segmentation* |An [ADE20K](https://groups.csail.mit.edu/vision/datasets/ADE20K/)'s segmentation protocol image.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_room_seg.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_room_seg.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_seg_1.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_seg_1.png"/></a> |
153
- |[lllyasviel/control_v11e_sd15_ip2p](https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p)<br/>*Trained with semantic segmentation* |An [ADE20K](https://groups.csail.mit.edu/vision/datasets/ADE20K/)'s segmentation protocol image.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_room_seg.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_room_seg.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_seg_1.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_seg_1.png"/></a> |
154
- |[lllyasviel/control_v11u_sd15_tile](https://huggingface.co/lllyasviel/control_v11u_sd15_tile)<br/>*Trained with semantic segmentation* |An [ADE20K](https://groups.csail.mit.edu/vision/datasets/ADE20K/)'s segmentation protocol image.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_room_seg.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_room_seg.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_seg_1.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_seg_1.png"/></a> |
155
 
156
  ### Training
157
 
158
- The v1.1 canny edge model was resumed from [Controlnet v1.0](https://huggingface.co/lllyasviel/sd-controlnet-canny) on continued training with 200 GPU hours of A100 80GB on edge-image,
159
- caption pairs using Stable Diffusion 1.5 as a base model.
160
 
161
  ### Blog post
162
 
163
- For more information, please also have a look at the [Diffusers ControlNet Blog Post](https://huggingface.co/blog/controlnet).
 
7
  - stable-diffusion
8
  ---
9
 
10
+ # Controlnet - v1.1 - *normalbae Version*
11
 
12
+ **Controlnet v1.1** is the successor model of [Controlnet v1.0](https://huggingface.co/lllyasviel/ControlNet)
13
  and was released in [lllyasviel/ControlNet-v1-1](https://huggingface.co/lllyasviel/ControlNet-v1-1) by [Lvmin Zhang](https://huggingface.co/lllyasviel).
14
 
15
+ This checkpoint is a conversion of [the original checkpoint](https://huggingface.co/lllyasviel/ControlNet-v1-1/blob/main/control_v11p_sd15_normalbae.pth) into `diffusers` format.
16
  It can be used in combination with **Stable Diffusion**, such as [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5).
17
 
18
 
 
23
 
24
  ![img](./sd.png)
25
 
26
+ This checkpoint corresponds to the ControlNet conditioned on **normalbae images**.
27
 
28
  ## Model Details
29
  - **Developed by:** Lvmin Zhang, Maneesh Agrawala
 
64
 
65
  **Note**: If you want to process an image to create the auxiliary conditioning, external dependencies are required as shown below:
66
 
67
+ 1. Install https://github.com/patrickvonplaten/controlnet_aux
68
 
69
  ```sh
70
+ $ pip install controlnet_aux==0.3.0
71
  ```
72
 
73
  2. Let's install `diffusers` and related packages:
 
84
  from huggingface_hub import HfApi
85
  from pathlib import Path
86
  from diffusers.utils import load_image
 
 
87
  from PIL import Image
88
+ import numpy as np
89
+ from controlnet_aux import NormalBaeDetector
90
 
91
  from diffusers import (
92
  ControlNetModel,
 
94
  UniPCMultistepScheduler,
95
  )
96
 
97
+ checkpoint = "ControlNet-1-1-preview/control_v11p_sd15_normalbae"
98
 
99
  image = load_image(
100
+ "https://huggingface.co/ControlNet-1-1-preview/control_v11p_sd15_normalbae/resolve/main/images/input.png"
101
  )
102
 
103
+ prompt = "A head full of roses"
104
+ processor = NormalBaeDetector.from_pretrained("lllyasviel/Annotators")
 
 
 
 
 
 
 
105
 
106
+ control_image = processor(image)
107
  control_image.save("./images/control.png")
108
 
109
  controlnet = ControlNetModel.from_pretrained(checkpoint, torch_dtype=torch.float16)
 
114
  pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
115
  pipe.enable_model_cpu_offload()
116
 
117
+ generator = torch.manual_seed(0)
118
+ image = pipe(prompt, num_inference_steps=30, generator=generator, image=image).images[0]
119
 
120
  image.save('images/image_out.png')
121
+
122
  ```
123
 
124
  ![bird](./images/input.png)
 
134
 
135
  | Model Name | Control Image Overview| Control Image Example | Generated Image Example |
136
  |---|---|---|---|
137
+ TODO
 
 
 
 
 
 
 
 
 
 
 
 
138
 
139
  ### Training
140
 
141
+ TODO
 
142
 
143
  ### Blog post
144
 
145
+ For more information, please also have a look at the [Diffusers ControlNet Blog Post](https://huggingface.co/blog/controlnet).