Transformers
English
controlnet
Inference Endpoints
File size: 8,621 Bytes
fcb72eb
a141c2f
fcb72eb
46b6a5b
fcb72eb
 
0bb179b
 
59da8a0
 
d3bbaf6
46b6a5b
a141c2f
 
d3bbaf6
 
 
46b6a5b
a141c2f
46b6a5b
 
 
d3bbaf6
 
0b3513f
 
46b6a5b
4918ff0
46b6a5b
 
 
 
a141c2f
db04c44
 
 
0bb179b
a141c2f
0b3513f
 
 
 
 
 
46b6a5b
59da8a0
 
 
 
1099b93
59da8a0
 
 
 
 
 
 
 
 
 
 
 
 
a141c2f
 
 
61baa04
0bb179b
d3bbaf6
 
129f9e8
d3bbaf6
a141c2f
46b6a5b
 
a141c2f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9f23e65
61baa04
9f23e65
a141c2f
 
 
 
 
 
61baa04
a141c2f
 
46b6a5b
 
 
 
 
 
 
 
 
 
 
 
0bb179b
4918ff0
a141c2f
4918ff0
a141c2f
46b6a5b
a141c2f
59da8a0
a141c2f
59da8a0
a141c2f
59da8a0
61baa04
59da8a0
0bb179b
 
d3bbaf6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46b6a5b
a141c2f
46b6a5b
d3bbaf6
a141c2f
d3bbaf6
 
 
a141c2f
0b3513f
 
 
 
9f23e65
0b3513f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
---
license: cc-by-nc-sa-4.0
datasets:
- ChristophSchuhmann/improved_aesthetics_6.5plus
language:
- en
tags:
- controlnet
---

Controls image generation by edge maps generated with [Edge Drawing](https://github.com/CihanTopal/ED_Lib). Note that Edge Drawing comes in different flavors: original (_ed_), parameter free (_edpf_), color (_edcolor_).

* Based on my monologs at [github.com - Edge Drawing](https://github.com/lllyasviel/ControlNet/discussions/318)
* For usage see the model page on [civitai.com - Model](https://civitai.com/models/149740).
* To generate edpf maps you can use [this space](https://huggingface.co/spaces/GeroldMeisinger/edpf) or [this script gitlab.com](https://gitlab.com/-/snippets/3601881).
* For evaluation see the corresponding .zip with images at the files.
* To run your own evaluations you can use [this script at gitlab.com](https://gitlab.com/-/snippets/3602096).

**Edge Drawing Parameter Free**

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c0ec65a2ec8cb2f589233a/jmdCGeMJx4dKFGo44cuEq.png)

_Clear and pristine! Wooow!_

**Example**

sampler=UniPC steps=20 cfg=7.5 seed=0 batch=9 model: v1-5-pruned-emaonly.safetensors cherry-picked: 1/9

prompt: _a detailed high-quality professional photo of swedish woman standing in front of a mirror, dark brown hair, white hat with purple feather_

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c0ec65a2ec8cb2f589233a/2PSWsmzLdHeVG-i67S7jF.png)

**Canndy Edge for comparison (default in Automatic1111)**

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c0ec65a2ec8cb2f589233a/JZTpa-HZfw0NUYnxZ52Iu.png)

_Noise, artifacts and missing edges. Yuck! Ugh!_

# Image dataset

* [laion2B-en aesthetics>=6.5 dataset](https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6.5plus)
* `--min_image_size 512 --max_aspect_ratio 2 --resize_mode="center_crop" --image_size 512`
* resulting in 180k images

# Training

```
accelerate launch train_controlnet.py ^
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" ^
  --output_dir="control-edgedrawing-[version]-fp16/" ^
  --dataset_name="mydataset" ^
  --mixed_precision="fp16" ^
  --resolution=512 ^
  --learning_rate=1e-5 ^
  --train_batch_size=1 ^
  --gradient_accumulation_steps=4 ^
  --gradient_checkpointing ^
  --use_8bit_adam ^
  --enable_xformers_memory_efficient_attention ^
  --set_grads_to_none ^
  --seed=0
```

# Evaluation

To evaluate the model it makes sense to compare it with the original Canny model. Original evaluations and comparisons are available at [ControlNet 1.0 repo](https://github.com/lllyasviel/ControlNet), [ControlNet 1.1 repo](https://github.com/lllyasviel/ControlNet-v1-1-nightly), [ControlNet paper v1](https://arxiv.org/abs/2302.05543v1), [ControlNet paper v2](https://arxiv.org/abs/2302.05543) and [Diffusers implementation](https://huggingface.co/takuma104/controlnet_dev/tree/main). Some points we have to keep in mind when comparing canny with edpf in order not to compare apples with oranges:
* canny 1.0 model was trained on 3M images with fp32, canny 1.1 model on even more, while edpf model so far is only trained on a 180k-360k with fp16.
* canny edge-detector requires parameter tuning while edpf is parameter free.
* Should we manually fine-tune canny to find the perfect input image or do we leave it at default? We could argue that "no fine-tuning required" is the usp of edpf and we want to compare in the default setting, whereas canny fine-tuning is subjective.
* Would the canny model actually benefit from a edpf pre-processor and we might not even require a specialized edpf model? (2023-09-25: see `eval_canny_edpf.zip` but it seems as if it doesn't work and the edpf model may be justified)
* When evaluating human images we need to be aware of Stable Diffusion's inherent limits, like disformed faces and hands, and don't attribute them to the control net.
* When evaluating style we need to be aware of the bias from the image dataset (`laion2b-en-aesthetics65`), which might tend to generating "aesthetic" images, and not actually work "intrisicly better".

# Versions

**Experiment 1 - 2023-09-19 - control-edgedrawing-default-drop50-fp16-checkpoint-40000**

Images converted with https://github.com/shaojunluo/EDLinePython (based on original (non-parameter free) edge drawing). Default settings are:

`smoothed=False`

```
{ 'ksize'            :  5
, 'sigma'            :  1.0
, 'gradientThreshold': 36
, 'anchorThreshold'  :  8
, 'scanIntervals'    :  1
}
```

additional arguments: `--proportion_empty_prompts=0.5`.

Trained for 40000 steps with default settings => results are not good. empty prompts were probably too excessive. retry with no drops and different algorithm parameters.

Update 2023-09-22: bug in algorithm produces too sparse images on default, see https://github.com/shaojunluo/EDLinePython/issues/4

**Experiment 2 - 2023-09-20 - control-edgedrawing-default-noisy-drop0-fp16-checkpoint-40000**

Same as experiment 1 with `smoothed=True` and `--proportion_empty_prompts=0`.

Trained for 40000 steps with default settings => results are not good. conditioning images look too noisy. investigate algorithm.

**Experiment 3.0 - 2023-09-22 - control-edgedrawing-cv480edpf-drop0-fp16-checkpoint-45000**

Conditioning images generated with [edpf.py](https://gitlab.com/-/snippets/3601881) using [opencv-contrib-python::ximgproc::EdgeDrawing](https://docs.opencv.org/4.8.0/d1/d1c/classcv_1_1ximgproc_1_1EdgeDrawing.html).

```
ed     = cv2.ximgproc.createEdgeDrawing()
params = cv2.ximgproc.EdgeDrawing.Params()
params.PFmode = True
ed.setParams(params)
edges    = ed.detectEdges(image)
edge_map = ed.getEdgeImage(edges)
``` 

45000 steps => looks good. resuming with left-right flipped images. released as **version 0.1 on civitai**.

**Experiment 3.1 - 2023-09-24 - control-edgedrawing-cv480edpf-drop0-fp16-checkpoint-90000**

90000 steps (45000 steps on original, 45000 steps with left-right flipped images) => quality became better, might release as 0.2 on civitai.

**Experiment 3.2 - 2023-09-24 -control-edgedrawing-cv480edpf-drop0+50-fp16-checkpoint-118000**

resumed with epoch 2 from 90000 using `--proportion_empty_prompts=0.5` => results became worse, CN didn't pick up on no-prompts (I also tried intermediate checkpoint-104000). restarting with 50% drop.

**Experiment 4.0 - 2023-09-25 - control-edgedrawing-cv480edpf-drop50-fp16-checkpoint-45000**

see experiment 3.0. restarted from 0 with `--proportion_empty_prompts=0.5` => results are not good, 50% is probably too much for 45k steps. guessmode still doesn't work and tends to produces humans. resuming until 90k with right-left flipped in the hope it will get better with more images.

**Experiment 4.1 - 2023-09-26 - control-edgedrawing-cv480edpf-drop50-fp16-checkpoint-90000**

resumed from 45000 steps with left-right flipped images until 90000 steps => results are still not good, 50% is probably also too much for 90k steps. guessmode still doesn't work and tends to produces humans. aborting.

** Experiment 5.0 - 2023-09-28 - control-edgedrawing-cv480edpf-fastdup-fp16-checkpoint-45000 **

see experiment 3. cleaned original images following the [fastdup introduction](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/cleaning-image-dataset.ipynb) resulting in:
```
180210 images in total
 67854 duplicates
   644 outliers
    26 too dark
   321 too bright
    57 blurry
 68621 unique removed (that's 38%!)
```

restarted from 0 with left-right flipped images and `--mixed-precision="no"` to create a master release and converted to fp16 afterwards.

** Experiment 6.0 - control-edgedrawing-cv480edpf-rect-fp16-checkpoint-XXXXX **

see experiment 5.0.
* included images with aspect ratio > 2
* resized images with shortside to 512 which gives us rectangular images instead of 512x512 squares
* center-cropped images to 512x(n)*64 (to make them SD compatible) and max longside 1024
* sorted duplicates by `similarity` value from `laion2b-en-aesthetics65` to get the best `text` of all duplicates

```
183410 images in total
 75686 duplicates
   381 outliers
    50 too dark
   436 too bright
    31 blurry
 76288 unique removed (that's 42%!)
```

restarted from 0 and `--mixed-precision="fp16"`.

# Ideas

* make conceptual captions for laion
* integrate edcolor
* try to fine-tune from canny
* image dataset with better captions (cc3m)
* remove images by semantic (use only photos, paintings etc. for edge detection)
* re-train with fp32

# Question and answers

**Q: What's the point of another edge control net anyway?**

A: 🤷