tellurion commited on
Commit
79ccf3a
·
verified ·
1 Parent(s): 2831637

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +124 -3
README.md CHANGED
@@ -1,3 +1,124 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ ---
4
+ # ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text
5
+
6
+ ![img](assets/teaser.png)
7
+
8
+ (March. 2025)
9
+ Fundemental issue for this repository: [ColorizeDiffusion (e-print)](https://arxiv.org/abs/2401.01456).
10
+ Version 1 - trained with 512px (WACV 2025): [ColorizeDiffusion](https://openaccess.thecvf.com/content/WACV2025/html/Yan_ColorizeDiffusion_Improving_Reference-Based_Sketch_Colorization_with_Latent_Diffusion_Model_WACV_2025_paper.html) Basic reference-based training. Released.
11
+ Version 1.5 - trained with 512px (CVPR 2025): [ColorizeDiffusion 1.5 (e-preprint)](https://arxiv.org/html/2502.19937v1) Solving spatial entangelment. Released.
12
+ Version 2 - trained with 768px, paper and code: Enhancing background and style transfer. Available soon.
13
+ Version XL - trained with 1024px : Enhancing embedding guidance for character colorization, geometry disentanglement. Ongoing.
14
+
15
+ Model weights are available: https://huggingface.co/tellurion/colorizer.
16
+
17
+ ## Implementation Details
18
+ The repository offers the implementation of ColorizeDiffusion.
19
+ Now, only the noisy model introduced in the paper, which utilizes the local tokens.
20
+
21
+ ## Getting Start
22
+ To utilize the code in this repository, ensure that you have installed the required dependencies as specified in the requirements.
23
+
24
+ ### To install and run:
25
+ ```shell
26
+ conda env create -f environment.yaml
27
+ conda activate hf
28
+ ```
29
+
30
+ ## User Interface:
31
+ We also provided a Web UI based on Gradio UI. To run it, just:
32
+ ```shell
33
+ python -u app.py
34
+ ```
35
+ Then you can browse the UI in http://localhost:7860/.
36
+
37
+ ### Inference:
38
+ -------------------------------------------------------------------------------------------
39
+ #### Important inference options:
40
+ | Options | Description |
41
+ |:--------------------------|:----------------------------------------------------------------------------------|
42
+ | Mask guide mode | Activate mask guided attention and corresponding lora weights for colorization. |
43
+ | Crossattn scale | Used to diminish all kinds of artifacts caused by the distribution problem. |
44
+ | Pad reference with margin | Used to diminish spatial entanglement, pad reference to T times of current width. |
45
+ | Reference guidance scale | Classifier-free guidance scale of the reference image, suggested 5. |
46
+ | Sketch guidance scale | Classifier-free guidance scale of the sketch image, suggested 1. |
47
+ | Attention injection | Strengthen similarity with reference. |
48
+ | Visualize | Used for local manipulation. Visualize the regions selected by each threshold. |
49
+
50
+ For artifacts like spatial entanglement (the distribution problem discussed in the paper) like this
51
+ ![img](assets/entanglement.png)
52
+ Please activate background enhance (optionally with foreground enhance).
53
+
54
+ ### Manipulation:
55
+ The colorization results can be manipulated using text prompts.
56
+
57
+ For local manipulations, a visualization is provided to show the correlation between each prompt and tokens in the reference image.
58
+
59
+
60
+ The manipulation result and correlation visualization of the settings:
61
+
62
+ Target prompt: the girl's blonde hair
63
+ Anchor prompt the girl's brown hair
64
+ Control prompt the girl's brown hair,
65
+ Target scale: 8
66
+ Enhanced: false
67
+ Thresholds: 0.5、0.55、0.65、0.95
68
+
69
+ ![img](assets/preview1.png)
70
+ ![img](assets/preview2.png)
71
+ As you can see, the manipluation unavoidably changed some unrelated regions as it is taken on the reference embeddings.
72
+
73
+ #### Manipulation options:
74
+ | Options | Description |
75
+ | :----- |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
76
+ | Group index | The index of selected manipulation sequences's parameter group. |
77
+ | Target prompt | The prompt used to specify the desired visual attribute for the image after manipulation. |
78
+ | Anchor prompt | The prompt to specify the anchored visaul attribute for the image before manipulation. |
79
+ | Control prompt | Used for local manipulation (crossattn-based models). The prompt to specify the target regions. |
80
+ | Enhance | Specify whether this manipulation should be enhanced or not. (More likely to influence unrelated attribute). |
81
+ | Target scale | The scale used to progressively control the manipulation. |
82
+ | Thresholds | Used for local manipulation (crossattn-based models). Four hyperparameters used to reduce the influnece on irrelevant visual attributes, where 0.0 < threshold 0 < threshold 1 < threshold 2 < threshold 3 < 1.0. |
83
+ | \<Threshold0 | Select regions most related to control prompt. Indicated by deep blue. |
84
+ | Threshold0-Threshold1 | Select regions related to control prompt. Indicated by blue. |
85
+ | Threshold1-Threshold2 | Select neighbouring but unrelated regions. Indicated by green. |
86
+ | Threshold2-Threshold3 | Select unrelated regions. Indicated by orange. |
87
+ | \>Threshold3 | Select most unrelated regions. Indicated by brown. |
88
+ |Add| Click add to save current manipulation in the sequence. |
89
+
90
+ ## Code reference
91
+ 1. [Stable Diffusion v2](https://github.com/Stability-AI/stablediffusion)
92
+ 2. [Stable Diffusion XL](https://github.com/Stability-AI/generative-models)
93
+ 3. [SD-webui-ControlNet](https://github.com/Mikubill/sd-webui-controlnet)
94
+ 4. [Stable-Diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui)
95
+ 5. [K-diffusion](https://github.com/crowsonkb/k-diffusion)
96
+ 6. [Deepspeed](https://github.com/microsoft/DeepSpeed)
97
+ 7. [sketchKeras-PyTorch](https://github.com/higumax/sketchKeras-pytorch)
98
+
99
+ ## Citation
100
+ ```
101
+ @article{2024arXiv240101456Y,
102
+ author = {{Yan}, Dingkun and {Yuan}, Liang and {Wu}, Erwin and {Nishioka}, Yuma and {Fujishiro}, Issei and {Saito}, Suguru},
103
+ title = "{ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text}",
104
+ journal = {arXiv e-prints},
105
+ year = {2024},
106
+ doi = {10.48550/arXiv.2401.01456},
107
+ }
108
+
109
+ @InProceedings{Yan_2025_WACV,
110
+ author = {Yan, Dingkun and Yuan, Liang and Wu, Erwin and Nishioka, Yuma and Fujishiro, Issei and Saito, Suguru},
111
+ title = {ColorizeDiffusion: Improving Reference-Based Sketch Colorization with Latent Diffusion Model},
112
+ booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
113
+ year = {2025},
114
+ pages = {5092-5102}
115
+ }
116
+
117
+ @article{2025arXiv250219937Y,
118
+ author = {{Yan}, Dingkun and {Wang}, Xinrui and {Li}, Zhuoru and {Saito}, Suguru and {Iwasawa}, Yusuke and {Matsuo}, Yutaka and {Guo}, Jiaxian},
119
+ title = "{Image Referenced Sketch Colorization Based on Animation Creation Workflow}",
120
+ journal = {arXiv e-prints},
121
+ year = {2025},
122
+ doi = {10.48550/arXiv.2502.19937},
123
+ }
124
+ ```