tellurion commited on
Commit
54eca88
·
verified ·
1 Parent(s): 9d5f259

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -39
README.md CHANGED
@@ -1,59 +1,69 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
4
  # ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text
5
 
6
- ![img](assets/teaser.png)
 
 
 
 
 
 
 
7
 
8
- (March. 2025)
9
- Fundemental issue for this repository: [ColorizeDiffusion (e-print)](https://arxiv.org/abs/2401.01456).
10
- Version 1 - trained with 512px (WACV 2025): [ColorizeDiffusion](https://openaccess.thecvf.com/content/WACV2025/html/Yan_ColorizeDiffusion_Improving_Reference-Based_Sketch_Colorization_with_Latent_Diffusion_Model_WACV_2025_paper.html) Basic reference-based training. Released.
11
- Version 1.5 - trained with 512px (CVPR 2025): [ColorizeDiffusion 1.5 (e-preprint)](https://arxiv.org/html/2502.19937v1) Solving spatial entangelment. Released.
12
- Version 2 - trained with 768px, paper and code: Enhancing background and style transfer. Available soon.
13
- Version XL - trained with 1024px : Enhancing embedding guidance for character colorization, geometry disentanglement. Ongoing.
14
 
15
- Model weights are available: https://huggingface.co/tellurion/colorizer.
16
- Code: https://github.com/tellurion-kanata/colorizeDiffusion
 
 
 
 
 
17
 
18
- ## Implementation Details
19
- The repository offers the implementation of ColorizeDiffusion.
20
- Now, only the noisy model introduced in the paper, which utilizes the local tokens.
21
 
22
  ## Getting Start
23
- To utilize the code in this repository, ensure that you have installed the required dependencies as specified in the requirements.
24
 
25
- ### To install and run:
26
  ```shell
27
  conda env create -f environment.yaml
28
  conda activate hf
29
  ```
30
 
31
- ## User Interface:
32
- We also provided a Web UI based on Gradio UI. To run it, just:
 
 
33
  ```shell
34
  python -u app.py
35
  ```
36
- Then you can browse the UI in http://localhost:7860/.
37
-
38
- ### Inference:
39
- -------------------------------------------------------------------------------------------
40
- #### Important inference options:
41
- | Options | Description |
42
- |:--------------------------|:----------------------------------------------------------------------------------|
43
- | Mask guide mode | Activate mask guided attention and corresponding lora weights for colorization. |
44
- | Crossattn scale | Used to diminish all kinds of artifacts caused by the distribution problem. |
45
- | Pad reference with margin | Used to diminish spatial entanglement, pad reference to T times of current width. |
46
- | Reference guidance scale | Classifier-free guidance scale of the reference image, suggested 5. |
47
- | Sketch guidance scale | Classifier-free guidance scale of the sketch image, suggested 1. |
48
- | Attention injection | Strengthen similarity with reference. |
49
- | Visualize | Used for local manipulation. Visualize the regions selected by each threshold. |
50
-
51
- For artifacts like spatial entanglement (the distribution problem discussed in the paper) like this
52
  ![img](assets/entanglement.png)
53
  Please activate background enhance (optionally with foreground enhance).
54
 
55
- ### Manipulation:
56
- The colorization results can be manipulated using text prompts.
 
 
 
 
 
 
 
57
 
58
  For local manipulations, a visualization is provided to show the correlation between each prompt and tokens in the reference image.
59
 
@@ -71,7 +81,7 @@ The manipulation result and correlation visualization of the settings:
71
  ![img](assets/preview2.png)
72
  As you can see, the manipluation unavoidably changed some unrelated regions as it is taken on the reference embeddings.
73
 
74
- #### Manipulation options:
75
  | Options | Description |
76
  | :----- |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
77
  | Group index | The index of selected manipulation sequences's parameter group. |
@@ -122,4 +132,11 @@ As you can see, the manipluation unavoidably changed some unrelated regions as i
122
  year = {2025},
123
  doi = {10.48550/arXiv.2502.19937},
124
  }
125
- ```
 
 
 
 
 
 
 
 
 
 
 
1
  # ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text
2
 
3
+ <div align="center">
4
+
5
+ [![arXiv Paper](https://img.shields.io/badge/arXiv-2407.15886%20(base)-B31B1B?style=flat&logo=arXiv)](https://arxiv.org/abs/2401.01456)
6
+ [![WACV 2025](https://img.shields.io/badge/WACV%202025-v1-0CA4A5?style=flat&logo=Semantic%20Web)](https://openaccess.thecvf.com/content/WACV2025/html/Yan_ColorizeDiffusion_Improving_Reference-Based_Sketch_Colorization_with_Latent_Diffusion_Model_WACV_2025_paper.html)
7
+ [![arXiv v1.5 Paper](https://img.shields.io/badge/arXiv-2502.19937%20(v1.5)-B31B1B?style=flat&logo=arXiv)](https://arxiv.org/abs/2502.19937)
8
+ [![arXiv v2 Paper](https://img.shields.io/badge/arXiv-2504.06895%20(v2)-B31B1B?style=flat&logo=arXiv)](https://arxiv.org/abs/2504.06895)
9
+ [![Model Weights](https://img.shields.io/badge/Hugging%20Face-Model%20Weights-FF9D00?style=flat&logo=Hugging%20Face)](https://huggingface.co/tellurion/ColorizeDiffusion/tree/main)
10
+ [![License](https://img.shields.io/badge/License-CC--BY--NC--SA%204.0-4CAF50?style=flat&logo=Creative%20Commons)](https://github.com/tellurion-kanata/colorizeDiffusion/blob/master/LICENSE)
11
 
12
+ </div>
13
+
14
+ ![img](assets/teaser.png)
 
 
 
15
 
16
+ (April. 2025)
17
+ Official implementation of Colorize Diffusion.
18
+ Fundamental issue for this repository: [ColorizeDiffusion (e-print)](https://arxiv.org/abs/2401.01456).
19
+ ***Version 1*** - Base training, 512px. Released, ckpt starts with **mult**.
20
+ ***Version 1.5*** - Solving spatial entanglement, 512px. Released, ckpt starts with **switch**.
21
+ ***Version 2*** - Enhancing background and style transfer, 768px. Released, ckpt starts with **v2**.
22
+ ***Version XL*** - Enhancing embedding guidance for character colorization, geometry disentanglement, 1024px. Available soon.
23
 
 
 
 
24
 
25
  ## Getting Start
 
26
 
27
+ -------------------------------------------------------------------------------------------
28
  ```shell
29
  conda env create -f environment.yaml
30
  conda activate hf
31
  ```
32
 
33
+ ## User Interface
34
+
35
+ -------------------------------------------------------------------------------------------
36
+ We implement a fully-featured UI. To run it, just:
37
  ```shell
38
  python -u app.py
39
  ```
40
+ The default server address is http://localhost:7860.
41
+
42
+ #### Important inference options
43
+ | Options | Description |
44
+ |:--------------------------|:----------------------------------------------------------------------------------------------------------------|
45
+ | Mask guide mode | Activate mask guided attention and corresponding lora weights for colorization. |
46
+ | Crossattn scale | Used to diminish all kinds of artifacts caused by the distribution problem. |
47
+ | Pad reference with margin | Used to diminish spatial entanglement, pad reference to T times of current width. |
48
+ | Reference guidance scale | Classifier-free guidance scale of the reference image, suggested 5. |
49
+ | Preprocessor | Preprocessing for the sketch input. **Extract** is suggested if the sketch input is complicated pencil drawing. |
50
+ | Sketch guidance scale | Classifier-free guidance scale of the sketch image, suggested 1. |
51
+ | Attention injection | Strengthen similarity with reference through self-injection. |
52
+ | Visualize | Used for local manipulation. Visualize the regions selected by each threshold. |
53
+
54
+ For artifacts like spatial entanglement like this
 
55
  ![img](assets/entanglement.png)
56
  Please activate background enhance (optionally with foreground enhance).
57
 
58
+ ## Manipulation
59
+
60
+ -------------------------------------------------------------------------------------------
61
+ The colorization results can be manipulated using text prompts, see [ColorizeDiffusion (e-print)](https://arxiv.org/abs/2401.01456).
62
+
63
+ It is now deactivated by default. To activate it, use
64
+ ```shell
65
+ python -u app.py -manipulate
66
+ ```
67
 
68
  For local manipulations, a visualization is provided to show the correlation between each prompt and tokens in the reference image.
69
 
 
81
  ![img](assets/preview2.png)
82
  As you can see, the manipluation unavoidably changed some unrelated regions as it is taken on the reference embeddings.
83
 
84
+ #### Manipulation options
85
  | Options | Description |
86
  | :----- |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
87
  | Group index | The index of selected manipulation sequences's parameter group. |
 
132
  year = {2025},
133
  doi = {10.48550/arXiv.2502.19937},
134
  }
135
+
136
+ @article{yan2025colorizediffusionv2enhancingreferencebased,
137
+ title={ColorizeDiffusion v2: Enhancing Reference-based Sketch Colorization Through Separating Utilities},
138
+ author={Dingkun Yan and Xinrui Wang and Yusuke Iwasawa and Yutaka Matsuo and Suguru Saito and Jiaxian Guo},
139
+ year={2025},
140
+ journal = {arXiv e-prints},
141
+ doi = {10.48550/arXiv.2504.06895},
142
+ }