WindVChen commited on
Commit
6710c89
β€’
1 Parent(s): d2633d8
.gitignore ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ .idea/*
2
+ logs/*
3
+ wandb/*
4
+ pretrained_models/*
LICENSE ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "[]"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright [yyyy] [name of copyright owner]
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
README.md CHANGED
@@ -1,13 +1,295 @@
1
- ---
2
- title: INR Harmon
3
- emoji: πŸŒ–
4
- colorFrom: yellow
5
- colorTo: green
6
- sdk: gradio
7
- sdk_version: 3.37.0
8
- app_file: app.py
9
- pinned: false
10
- license: apache-2.0
11
- ---
12
-
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+
3
+ <h1><a href="https://arxiv.org/abs/2303.01681">Dense Pixel-to-Pixel Harmonization via <br /> Continuous Image Representation</a></h1>
4
+
5
+
6
+ **[Jianqi Chen](https://windvchen.github.io/), [Yilan Zhang](https://scholar.google.com.hk/citations?hl=en&user=wZ4M4ecAAAAJ), [Zhengxia Zou](https://scholar.google.com.hk/citations?hl=en&user=DzwoyZsAAAAJ), [Keyan Chen](https://scholar.google.com.hk/citations?hl=en&user=5RF4ia8AAAAJ),
7
+ and [Zhenwei Shi](https://scholar.google.com.hk/citations?hl=en&user=kNhFWQIAAAAJ)**
8
+
9
+ ![](https://komarev.com/ghpvc/?username=windvchenINR-Harmonization&label=visitors)
10
+ ![GitHub stars](https://badgen.net/github/stars/windvchen/INR-Harmonization)
11
+ [![](https://img.shields.io/badge/license-Apache--2.0-blue)](#License)
12
+ [![](https://img.shields.io/badge/arXiv-2303.01681-b31b1b.svg)](https://arxiv.org/abs/2303.01681)
13
+ <a href="https://huggingface.co/spaces/WindVChen/INR-Harmon"><img alt="Huggingface" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-INR Harmonization-orange"></a>
14
+
15
+ </div>
16
+
17
+ <p align = "center">
18
+ <img src="assets/title_harmon.gif"/ width="200">
19
+ <img src="assets/title_any_image.gif"/ width="200">
20
+ <img src="assets/title_you_want.gif"/ width="200">
21
+ </p>
22
+
23
+ <div align="center">
24
+ <img src="assets/demo.gif" width="600">
25
+ </div>
26
+
27
+
28
+ ### Share us a :star: if this repo does help
29
+
30
+ This repository is the official implementation of ***HINet (or INR-Harmonization)***, which can achieve ***Arbitrary aspect ratio & Arbitrary resolution*** image harmonization. If you encounter any question, please feel free to contact
31
+ us. You can create an issue or just send email to me windvchen@gmail.com. Also welcome for any idea exchange and
32
+ discussion.
33
+
34
+ ## Updates
35
+
36
+ [**07/21/2023**] We achieve that!πŸŽ‰πŸŽ‰ With all **TODOs** complete! Try here for our [Huggingface Demo]()!! You can also download this repository, and run the GUI locally (refer to [cmd] here)!πŸ₯³πŸ₯³
37
+
38
+ [**07/19/2023**] Hi everyone! We have added two new inference
39
+ scripts: [efficient_inference_for_square_image.py](efficient_inference_for_square_image.py) where you can achieve quite
40
+ fast speed on harmonizing a ***square image***!
41
+ And [inference_for_arbitrary_resolution_image.py](inference_for_arbitrary_resolution_image.py) where you can harmonize
42
+ any resolution image ***(2K, 4k, 8k, JUST WHATEVER YOU WANT!!)***. Please check them out!πŸ˜‰πŸ˜‰
43
+
44
+ A summary of features of different inference strategies (More information please refer to [Inference](#inference)):
45
+
46
+ | Features | [efficient_inference_for_square_image.py](efficient_inference_for_square_image.py) | [inference_for_arbitrary_resolution_image.py](inference_for_arbitrary_resolution_image.py) |
47
+ |:-----------------------:|:----------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------:|
48
+ | Support Arbitrary Image | ❌ *(Only squre image)* | βœ… *(Arbitrary aspect ratio, Arbitrary resolution!!!)* |
49
+ | Speed | πŸš€ *(Quite fast)* | 🚌 *(Relatively slower than the left one)* |
50
+ | Memory cost | 🌲 *(Quite low)* | 🏭 *(Relatively higher than the left one for the same resolution)* |
51
+
52
+ [**07/18/2023**] Check out our new work [***Diff-Harmonization***](https://github.com/WindVChen/Diff-Harmonization),
53
+ which is a **Zero-Shot Harmonization** method based on *Diffusion Models*!😊
54
+
55
+ [**07/17/2023**] Pretrained weights have been released. Feel free to try that!πŸ‘‹πŸ‘‹
56
+
57
+ [**07/16/2023**] The code is initially public. πŸ₯³
58
+
59
+ [**03/06/2023**] Source code and pretrained models will be publicly accessible.
60
+
61
+ ## TODO
62
+
63
+ - [x] Initial code release.
64
+ - [x] Add pretrained model weights.
65
+ - [x] Add the efficient splitting strategy for inferencing on original resolution images.
66
+ - [x] Add Gradio demo.
67
+
68
+ ## Table of Contents
69
+
70
+ - [Abstract](#abstract)
71
+ - [Requirements](#requirements)
72
+ - [Training](#training)
73
+ - [Train in low resolution (LR) mode](#train-in-low-resolution--lr--mode)
74
+ - [Train in high resolution (HR) mode](#train-in-high-resolution--hr--mode--eg-2048x2048-)
75
+ - [Train in original resolution mode](#train-in-original-resolution-mode)
76
+ - [Evaluation](#evaluation)
77
+ - [Evaluation in low resolution (LR) mode](#evaluation-in-low-resolution--lr--mode)
78
+ - [Evaluation in high resolution (HR) mode](#evaluation-in-high-resolution--hr--mode--eg-2048x2048-)
79
+ - [Evaluation in original resolution mode](#evaluation-in-original-resolution-mode)
80
+ - [Inference](#inference)
81
+ - [Inference on square images (fast & low cost)](#inference-on-square-images--fast--low-cost-)
82
+ - [Inference on arbitrary resolution images (Support any resolution)](#Inference-on-arbitrary-resolution-images--slow-high-cost-but-support-any-resolution-)
83
+ - [Results](#results)
84
+ - [Citation & Acknowledgments](#citation--acknowledgments)
85
+ - [License](#license)
86
+
87
+ ## Abstract
88
+
89
+ ![HINet's framework](assets/network.png)
90
+
91
+ High-resolution (HR) image harmonization is of great significance in real-world applications such as image synthesis and
92
+ image editing. However, due to the high memory costs, existing dense pixel-to-pixel harmonization methods are mainly
93
+ focusing on processing low-resolution (LR) images. Some recent works resort to combining with color-to-color
94
+ transformations but are either limited to certain resolutions or heavily depend on hand-crafted image filters. In this
95
+ work, we explore leveraging the implicit neural representation (INR) and propose a novel
96
+ ***image Harmonization method based on Implicit neural Networks (HINet)***, which to the best of our knowledge, is
97
+ ***the first dense pixel-to-pixel method applicable to HR images without any hand-crafted filter design***. Inspired by
98
+ the Retinex theory, we decouple the MLPs into two parts to respectively capture the content and environment of composite
99
+ images. A Low-Resolution Image Prior (LRIP) network is designed to alleviate the Boundary Inconsistency problem, and we
100
+ also propose new designs for the training and inference process. Extensive experiments have demonstrated the
101
+ effectiveness of our method compared with state-of-the-art methods. Furthermore, some interesting and practical
102
+ applications of the proposed method are explored.
103
+
104
+ ## Requirements
105
+
106
+ 1. Software Requirements
107
+ - Python: 3.8
108
+ - CUDA: 11.3
109
+ - cuDNN: 8.4.1
110
+
111
+ To install other requirements:
112
+
113
+ ```
114
+ pip install -r requirements.txt
115
+ ```
116
+
117
+ 2. Datasets
118
+ - We train and evaluate on the [iHarmony4 dataset](https://github.com/bcmi/Image-Harmonization-Dataset-iHarmony4).
119
+ Please download the dataset in advance, and arrange them into the following structure:
120
+
121
+ ```
122
+ β”œβ”€β”€ dataset_path
123
+ β”œβ”€β”€ HAdobe5k
124
+ β”œβ”€β”€ composite_images
125
+ β”œβ”€β”€ masks
126
+ β”œβ”€β”€ real_images
127
+ β”œβ”€β”€ HCOCO
128
+ β”œβ”€β”€ Hday2night
129
+ β”œβ”€β”€ HFlickr
130
+ IHD_test.txt
131
+ IHD_train.txt
132
+ ```
133
+
134
+ - Before training we resize HAdobe5k subdataset so that each side is smaller than 1024. This is for quick data
135
+ loading. The resizing script can refer to [resize_Adobe.py](tools/resize_Adobe.py).
136
+
137
+ - For training or evaluating on the original resolution of iHarmony4 dataset. Please newly create a `HAdobe5kori`
138
+ directory with the original HAdobe5k images in it.
139
+
140
+ - If you want to train and evaluate only on HAdobe5k subdataset (see Table 1 in the paper), you can modify
141
+ the `IHD_train.txt` and `IHD_test.txt` in [train.py](train.py) to only contain the HAdobe5k images.
142
+
143
+ 3. Pre-trained Models
144
+ - We adopt [HRNetV2](https://github.com/HRNet/HRNet-Image-Classification) as our encoder, you can download the
145
+ weight
146
+ from [here](https://onedrive.live.com/?authkey=%21AMkPimlmClRvmpw&id=F7FD0B7F26543CEB%21112&cid=F7FD0B7F26543CEB&parId=root&parQt=sharedby&parCid=C8304F01C1A85932&o=OneUp)
147
+ and save the weight in `pretrained_models` directory.
148
+ - In the following table, we provide several model weights pretrained under different resolutions (Correspond to
149
+ Table 1 in the paper):
150
+
151
+ | Download Link | Model Descriptions |
152
+ |:--------------------------------------------------------:|:-------------------------------------------------------------------:|
153
+ | [Resolution_RAW_iHarmony4.pth][Resolution_RAW_iHarmony4] | Train by RSC strategy with original resolution iHarmony4 dataset |
154
+ | [Resolution_256_iHarmony4.pth][Resolution_256_iHarmony4] | Train with 256*256 resolution iHarmony4 dataset |
155
+ | [Resolution_RAW_HAdobe5K.pth][Resolution_RAW_HAdobe5K] | Train by RSC strategy with original resolution HAdobe5k subdataset |
156
+ | [Resolution_2048_HAdobe5K.pth][Resolution_2048_HAdobe5K] | Train by RSC strategy with 2048*2048 resolution HAdobe5k subdataset |
157
+ | [Resolution_1024_HAdobe5K.pth][Resolution_1024_HAdobe5K] | Train by RSC strategy with 1024*1024 resolution HAdobe5k subdataset |
158
+
159
+ [Resolution_RAW_iHarmony4]: https://drive.google.com/file/d/1O9faWNk54mIzMaGZ1tmgm0EJpH20a-Fl/view?usp=drive_link
160
+
161
+ [Resolution_256_iHarmony4]: https://drive.google.com/file/d/1xym96LTP9a75UseDWGW2KRN1gyl3HPyM/view?usp=sharing
162
+
163
+ [Resolution_RAW_HAdobe5K]: https://drive.google.com/file/d/1JeUS5inuOM0pASKfu-tK9K7E5pGkP570/view?usp=drive_link
164
+
165
+ [Resolution_2048_HAdobe5K]: https://drive.google.com/file/d/18RxTfZsPEoi6kSS_UVEsUBYRBHAl4MfB/view?usp=drive_link
166
+
167
+ [Resolution_1024_HAdobe5K]: https://drive.google.com/file/d/1cOY74mN8gIz66watyoobZ1knrigkQyb5/view?usp=sharing
168
+
169
+ ## Visualization GUI
170
+
171
+ We provide a GUI based on Gradio for visualizing the intermediate results of our method. You can run the following command to start it locally, or make use of our provided [Huggingface Space]().
172
+ ```bash
173
+ python app.py
174
+ ```
175
+
176
+ ## Training
177
+
178
+ The intermediate output (including checkpoint, visualization, log.txt) will be saved in directory `logs/exp`.
179
+
180
+ ### Train in low resolution (LR) mode
181
+
182
+ ```bash
183
+ python train.py --dataset_path {dataset_path} --base_size 256 --input_size 256 --INR_input_size 256
184
+ ```
185
+
186
+ - `dataset_path`: the path of the iHarmony4 dataset.
187
+ - `base_size`: the size of the input image to encoder.
188
+ - `input_size`: the size of the target resolution.
189
+ - `INR_input_size`: the size of the input image to the INR decoder.
190
+ - `hr_train`: whether to train in high resolution (HR) mode, i.e., using RSC strategy (See Section 3.4 in the paper).
191
+ - `isFullRes`: whether to train in full/original resolution mode.
192
+
193
+ - (More parameters' information could be found in codes ...)
194
+
195
+ ### Train in high resolution (HR) mode (E.g, 2048x2048)
196
+
197
+ If **not use RSC strategy**, the training command is as follows: (For a single RTX 3090, it will lead to out-of-memory
198
+ even `batch_size` is set to 2.)
199
+
200
+ ```bash
201
+ python train.py --dataset_path {dataset_path} --base_size 256 --input_size 2048 --INR_input_size 2048
202
+ ```
203
+
204
+ If **use RSC strategy**, the training command is as follows: (For a single RTX 3090, `batch_size` can set up to 6.)
205
+
206
+ ```bash
207
+ python train.py --dataset_path {dataset_path} --base_size 256 --input_size 2048 --INR_input_size 2048 --hr_train
208
+ ```
209
+
210
+ ### Train in original resolution mode
211
+
212
+ ```bash
213
+ python train.py --dataset_path {dataset_path} --base_size 256 --hr_train --isFullRes
214
+ ```
215
+
216
+ ## Evaluation
217
+
218
+ The intermediate output (including visualizations, log.txt) will be saved in directory `logs/test`.
219
+
220
+ **Notice:** Due to the resolution-agnostic characteristic of INR, you can evaluate dataset at any resolution not matter
221
+ which resolution the model is trained on. Please refer to Table 4 and Table 5 in the paper.
222
+
223
+ ### Evaluation in low resolution (LR) mode
224
+
225
+ ```bash
226
+ python inference.py --dataset_path {dataset_path} --pretrained {pretrained_weight} --base_size 256 --input_size 256 --INR_input_size 256
227
+ ```
228
+
229
+ ### Evaluation in high resolution (HR) mode (E.g, 2048x2048)
230
+
231
+ ```bash
232
+ python inference.py --dataset_path {dataset_path} --pretrained {pretrained_weight} --base_size 256 --input_size 2048 --INR_input_size 2048
233
+ ```
234
+
235
+ ### Evaluation in original resolution mode
236
+
237
+ ```bash
238
+ python inference.py --dataset_path {dataset_path} --pretrained {pretrained_weight} --base_size 256 --hr_train --isFullRes
239
+ ```
240
+
241
+ ## Inference
242
+
243
+ We have provided demo images (2K and 6K) in [demo](demo). Feel free to play around them.
244
+
245
+ **Notice:** Due to the resolution-agnostic characteristic of INR, you can inference images at any resolution not matter
246
+ which resolution the model is trained on. Please refer to Table 4 and Table 5 in the paper.
247
+
248
+ ### Inference on square images (fast & low cost)
249
+
250
+ If you want to inference on square images, please use the command here. Note that this code only support square images with resolution of multiplies of 256. Some other requirements will be listed in cmd prints (if error) when you run the code.
251
+
252
+ ```bash
253
+ python efficient_inference.py --split_resolution {split_resolution} --composite_image {composite_image_path} --mask {mask_path} --save_path --{save_path} --pretrained {pretrained_weight}
254
+ ```
255
+ - `split_resolution`: the resolution of the split patches. (E.g., 512 means the input image will be split into 512x512 patches.) These patches will finally be assembled back to the resolution of the original image.
256
+ - `composite_image`: the path of the composite image. You can try with the provided images in [demo](demo).
257
+ - `mask`: the path of the mask. You can try with the provided masks in [demo](demo).
258
+ - `save_path`: the path of the output image.
259
+ - `pretrained`: the path of the pretrained weight.
260
+
261
+ ### Inference on arbitrary resolution images (slow, high cost, but support any resolution)
262
+ If the former inference script cannot meet your needs and you want to inference on arbitrary resolution images, please use the command here. Note that this script will be slower and cost more memory for a same resolution (***But anyway, it supports arbitrary resolution***).
263
+
264
+ If you encounter out-of-memory error, please try to reduce the `split_num` parameter below. (Our script will also have some prints that can guide you to do this)
265
+ ```bash
266
+ python inference_for_arbitrary_resolution.py --split_num {split_num} --composite_image {composite_image_path} --mask {mask_path} --save_path --{save_path} --pretrained {pretrained_weight}
267
+ ```
268
+ - `split_num`: the number of splits for the input image. (E.g., 4 means the input image will be split into 4x4=16 patches.)
269
+ - `composite_image`: the path of the composite image. You can try with the provided images in [demo](demo).
270
+ - `mask`: the path of the mask. You can try with the provided masks in [demo](demo).
271
+ - `save_path`: the path of the output image.
272
+ - `pretrained`: the path of the pretrained weight.
273
+
274
+ ## Results
275
+
276
+ ![Metrics](assets/metrics.png#pic_center)
277
+ ![Visual comparisons](assets/visualizations.png#pic_center)
278
+ ![Visual comparisons2](assets/visualizations2.png#pic_center)
279
+
280
+ ## Citation & Acknowledgments
281
+
282
+ If you find this paper useful in your research, please consider citing:
283
+
284
+ ```
285
+ @article{chen2023dense,
286
+ title={Dense Pixel-to-Pixel Harmonization via Continuous Image Representation},
287
+ author={Chen, Jianqi and Zhang, Yilan and Zou, Zhengxia and Chen, Keyan and Shi, Zhenwei},
288
+ journal={arXiv preprint arXiv:2303.01681},
289
+ year={2023}
290
+ }
291
+ ```
292
+
293
+ ## License
294
+
295
+ This project is licensed under the Apache-2.0 license. See [LICENSE](LICENSE) for details.
app.py ADDED
@@ -0,0 +1,279 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+
3
+ import cv2
4
+
5
+ import gradio as gr
6
+ import numpy as np
7
+ import sys
8
+ import io
9
+ import torch
10
+
11
+
12
+ class Logger:
13
+ def __init__(self):
14
+ self.terminal = sys.stdout
15
+ self.log = io.BytesIO()
16
+
17
+ def write(self, message):
18
+ self.terminal.write(message)
19
+ self.log.write(bytes(message, encoding='utf-8'))
20
+
21
+ def flush(self):
22
+ self.terminal.flush()
23
+ self.log.flush()
24
+
25
+ def isatty(self):
26
+ return False
27
+
28
+
29
+ log = Logger()
30
+ sys.stdout = log
31
+
32
+ def read_logs():
33
+ out = log.log.getvalue().decode()
34
+ if out.count("\n") >= 30:
35
+ log.log = io.BytesIO()
36
+ sys.stdout.flush()
37
+ return out
38
+
39
+
40
+ with gr.Blocks() as app:
41
+ gr.Markdown("""
42
+ # HINet (or INR-Harmonization) - A novel image Harmonization method based on Implicit neural Networks
43
+ ## Harmonize any image you want! Arbitrary resolution, and arbitrary aspect ratio!
44
+ ### Official Gradio Demo
45
+ **Since Gradio Space only support CPU, the speed may kind of slow. You may better download the code to run locally with a GPU.**
46
+ <a href="https://huggingface.co/spaces/WindVChen/INR-Harmon?duplicate=true" style="display: inline-block;margin-top: .5em;margin-right: .25em;" target="_blank">
47
+ <img style="margin-bottom: 0em;display: inline;margin-top: -.25em;" src="https://bit.ly/3gLdBN6" alt="Duplicate Space"></a> for no queue on your own hardware.</p>
48
+ * Official Repo: [INR-Harmonization](https://github.com/WindVChen/INR-Harmonization)
49
+ """)
50
+
51
+ valid_checkpoints_dict = {"Resolution_256_iHarmony4": "Resolution_256_iHarmony4.pth",
52
+ "Resolution_1024_HAdobe5K": "Resolution_1024_HAdobe5K.pth",
53
+ "Resolution_2048_HAdobe5K": "Resolution_2048_HAdobe5K.pth",
54
+ "Resolution_RAW_HAdobe5K": "Resolution_RAW_HAdobe5K.pth",
55
+ "Resolution_RAW_iHarmony4": "Resolution_RAW_iHarmony4.pth"}
56
+
57
+ global_state = gr.State({
58
+ 'pretrained_weight': valid_checkpoints_dict["Resolution_RAW_iHarmony4"],
59
+
60
+ })
61
+ with gr.Row():
62
+ form_composite_image = gr.Image(label='Input Composite image', type='pil').style(height="auto")
63
+ form_mask_image = gr.Image(label='Input Mask image', type='pil', interactive=False).style(
64
+ height="auto")
65
+ with gr.Row():
66
+ with gr.Column(scale=4):
67
+ with gr.Row():
68
+ with gr.Column(scale=2, min_width=10):
69
+ gr.Markdown(value='Model Selection', show_label=False)
70
+
71
+ with gr.Column(scale=4, min_width=10):
72
+ form_pretrained_dropdown = gr.Dropdown(
73
+ choices=list(valid_checkpoints_dict.values()),
74
+ label="Pretrained Model",
75
+ value=valid_checkpoints_dict["Resolution_RAW_iHarmony4"],
76
+ interactive=True
77
+ )
78
+
79
+ with gr.Row():
80
+ with gr.Column(scale=2, min_width=10):
81
+ gr.Markdown(value='Inference Mode', show_label=False)
82
+
83
+ with gr.Column(scale=4, min_width=10):
84
+ form_inference_mode = gr.Radio(
85
+ ['Square Image', 'Arbitrary Image'],
86
+ value='Arbitrary Image',
87
+ interactive=False,
88
+ label='Mode',
89
+ )
90
+
91
+ with gr.Row():
92
+ with gr.Column(scale=2, min_width=10):
93
+ gr.Markdown(value='Split Parameter', show_label=False)
94
+
95
+ with gr.Column(scale=4, min_width=10):
96
+ form_split_res = gr.Slider(
97
+ minimum=0,
98
+ maximum=2048,
99
+ step=128,
100
+ value=256,
101
+ interactive=False,
102
+ label="Split Resolution",
103
+ )
104
+ form_split_num = gr.Number(
105
+ value=8,
106
+ interactive=False,
107
+ label="Split Number")
108
+ with gr.Row():
109
+ form_log = gr.Textbox(read_logs, label="Logs", interactive=False, type="text", every=1)
110
+
111
+ with gr.Column(scale=4):
112
+ form_harmonized_image = gr.Image(label='Harmonized Result', type='numpy', interactive=False).style(
113
+ height="auto")
114
+ form_start_btn = gr.Button("Start Harmonization", interactive=False)
115
+ form_reset_btn = gr.Button("Reset", interactive=True)
116
+
117
+
118
+ def on_change_form_composite_image(form_composite_image):
119
+ if form_composite_image is None:
120
+ return gr.update(interactive=False, value=None), gr.update(value=None)
121
+ return gr.update(interactive=True), gr.update(value=None)
122
+
123
+
124
+ def on_change_form_mask_image(form_composite_image, form_mask_image):
125
+ if form_mask_image is None:
126
+ return gr.update(interactive=False if form_composite_image is None else True), gr.update(
127
+ interactive=False), gr.update(interactive=False), gr.update(
128
+ interactive=False), gr.update(interactive=False), gr.update(value=None)
129
+
130
+ if form_composite_image.size[:2] != form_mask_image.size[:2]:
131
+ raise gr.Error("Composite image and mask image should have the same resolution!")
132
+ else:
133
+ w, h = form_composite_image.size[:2]
134
+ if h != w or (h % 16 != 0):
135
+ return gr.update(value='Arbitrary Image', interactive=False), gr.update(interactive=True), gr.update(
136
+ interactive=True), gr.update(interactive=True), gr.update(interactive=False,
137
+ value=-1), gr.update(value=None)
138
+ else:
139
+ return gr.update(value='Square Image', interactive=True), gr.update(interactive=True), gr.update(
140
+ interactive=True), gr.update(interactive=False), gr.update(interactive=True,
141
+ value=h // 16,
142
+ maximum=h,
143
+ minimum=h // 16,
144
+ step=h // 16), gr.update(value=None)
145
+
146
+
147
+ form_composite_image.change(
148
+ on_change_form_composite_image,
149
+ inputs=[form_composite_image],
150
+ outputs=[form_mask_image, form_harmonized_image]
151
+ )
152
+
153
+ form_mask_image.change(
154
+ on_change_form_mask_image,
155
+ inputs=[form_composite_image, form_mask_image],
156
+ outputs=[form_inference_mode, form_mask_image, form_start_btn, form_split_num, form_split_res,
157
+ form_harmonized_image]
158
+ )
159
+
160
+
161
+ def on_change_form_split_num(form_composite_image, form_split_num):
162
+ w, h = form_composite_image.size[:2]
163
+ if form_split_num < 1:
164
+ return gr.update(value=1)
165
+ elif form_split_num > min(w, h):
166
+ return gr.update(value=min(w, h))
167
+ else:
168
+ return gr.update(value=form_split_num)
169
+
170
+
171
+ form_split_num.change(
172
+ on_change_form_split_num,
173
+ inputs=[form_composite_image, form_split_num],
174
+ outputs=[form_split_num]
175
+ )
176
+
177
+
178
+ def on_change_form_inference_mode(form_inference_mode):
179
+ if form_inference_mode == "Square Image":
180
+ return gr.update(interactive=True), gr.update(interactive=False)
181
+ else:
182
+ return gr.update(interactive=False), gr.update(interactive=True)
183
+
184
+
185
+ form_inference_mode.change(on_change_form_inference_mode, inputs=[form_inference_mode],
186
+ outputs=[form_split_res, form_split_num])
187
+
188
+
189
+ def on_click_form_start_btn(form_composite_image, form_mask_image, form_pretrained_dropdown, form_inference_mode,
190
+ form_split_res, form_split_num):
191
+ log.log = io.BytesIO()
192
+ if form_inference_mode == "Square Image":
193
+ from efficient_inference_for_square_image import parse_args, main_process
194
+ opt = parse_args()
195
+ opt.transform_mean = [.5, .5, .5]
196
+ opt.transform_var = [.5, .5, .5]
197
+ opt.pretrained = os.path.join("./pretrained_models", form_pretrained_dropdown)
198
+ opt.split_resolution = form_split_res
199
+ opt.save_path = None
200
+ opt.workers = 0
201
+ opt.device = "cuda" if torch.cuda.is_available() else "cpu"
202
+
203
+ composite_image = np.asarray(form_composite_image)
204
+ mask = np.asarray(form_mask_image)
205
+
206
+ try:
207
+ return cv2.cvtColor(
208
+ main_process(opt, composite_image=composite_image, mask=mask),
209
+ cv2.COLOR_BGR2RGB)
210
+ except:
211
+ raise gr.Error("Patches too big. Try to reduce the `split_res`!")
212
+
213
+ else:
214
+ from inference_for_arbitrary_resolution_image import parse_args, main_process
215
+ opt = parse_args()
216
+ opt.transform_mean = [.5, .5, .5]
217
+ opt.transform_var = [.5, .5, .5]
218
+ opt.pretrained = os.path.join("./pretrained_models", form_pretrained_dropdown)
219
+ opt.split_num = int(form_split_num)
220
+ opt.save_path = None
221
+ opt.workers = 0
222
+ opt.device = "cuda" if torch.cuda.is_available() else "cpu"
223
+
224
+ composite_image = np.asarray(form_composite_image)
225
+ mask = np.asarray(form_mask_image)
226
+
227
+ try:
228
+ return cv2.cvtColor(
229
+ main_process(opt, composite_image=composite_image, mask=mask),
230
+ cv2.COLOR_BGR2RGB)
231
+ except:
232
+ raise gr.Error("Patches too big. Try to increase the `split_num`!")
233
+
234
+
235
+ form_start_btn.click(on_click_form_start_btn,
236
+ inputs=[form_composite_image, form_mask_image, form_pretrained_dropdown, form_inference_mode,
237
+ form_split_res, form_split_num], outputs=[form_harmonized_image])
238
+
239
+
240
+ def on_click_form_reset_btn():
241
+ log.log = io.BytesIO()
242
+ return gr.update(value=None), gr.update(value=None, interactive=True), gr.update(value=None,
243
+ interactive=False), gr.update(
244
+ interactive=False)
245
+
246
+
247
+ form_reset_btn.click(on_click_form_reset_btn,
248
+ inputs=None, outputs=[form_log, form_composite_image, form_mask_image, form_start_btn])
249
+
250
+ gr.Markdown("""
251
+ ## Quick Start
252
+ 1. Select desired `Pretrained Model`.
253
+ 2. Select a composite image, and then a mask with the same size.
254
+ 3. Select the inference mode (for non-square image, only `Arbitrary Image` support).
255
+ 4. Set `Split Resolution` (Patches' resolution) or `Split Number` (How many patches, about N*N) according to the inference mode.
256
+ 3. Click `Start` and enjoy it!
257
+
258
+ """)
259
+ gr.HTML("""
260
+ <style>
261
+ .container {
262
+ position: absolute;
263
+ height: 50px;
264
+ text-align: center;
265
+ line-height: 50px;
266
+ width: 100%;
267
+ }
268
+ </style>
269
+ <div class="container">
270
+ Gradio demo supported by
271
+ <a href="https://github.com/WindVChen">WindVChen</a>
272
+ </div>
273
+ """)
274
+
275
+ gr.close_all()
276
+
277
+ app.queue(concurrency_count=1, max_size=200, api_open=False)
278
+
279
+ app.launch(show_api=False, server_port=12345)
efficient_inference_for_square_image.py ADDED
@@ -0,0 +1,345 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+
3
+ import torch.backends.cudnn as cudnn
4
+ import torchvision.transforms as transforms
5
+ from torch.utils.data import DataLoader
6
+
7
+ from model.build_model import build_model
8
+
9
+ import torch
10
+ import cv2
11
+ import numpy as np
12
+ import torchvision
13
+ import os
14
+ import tqdm
15
+ import time
16
+
17
+ from utils.misc import prepare_cooridinate_input, customRandomCrop
18
+
19
+ from datasets.build_INR_dataset import Implicit2DGenerator
20
+ import albumentations
21
+ from albumentations import Resize
22
+ from torch.utils.data import DataLoader
23
+ from utils.misc import normalize
24
+
25
+ import math
26
+
27
+
28
+ class single_image_dataset(torch.utils.data.Dataset):
29
+ def __init__(self, opt, composite_image=None, mask=None):
30
+ super().__init__()
31
+
32
+ self.opt = opt
33
+
34
+ if composite_image is None:
35
+ composite_image = cv2.imread(opt.composite_image)
36
+ composite_image = cv2.cvtColor(composite_image, cv2.COLOR_BGR2RGB)
37
+ self.composite_image = composite_image
38
+
39
+ assert composite_image.shape[0] == composite_image.shape[1], "This faster script only supports square images."
40
+ assert composite_image.shape[
41
+ 0] % 256 == 0, "This faster script only supports images with resolution multiples of 256."
42
+ assert opt.split_resolution % (composite_image.shape[
43
+ 0] // 16) == 0, f"The image resolution is {composite_image.shape[0]}, " \
44
+ f"you should set {opt.split_resolution} to multiplies of {composite_image.shape[0] // 16}"
45
+
46
+ if mask is None:
47
+ mask = cv2.imread(opt.mask)
48
+ mask = mask[:, :, 0].astype(np.float32) / 255.
49
+ self.mask = mask
50
+
51
+ self.torch_transforms = transforms.Compose([transforms.ToTensor(),
52
+ transforms.Normalize([.5, .5, .5], [.5, .5, .5])])
53
+ self.INR_dataset = Implicit2DGenerator(opt, 'Val')
54
+
55
+ self.split_width_resolution = self.split_height_resolution = opt.split_resolution
56
+
57
+ self.num_w = math.ceil(composite_image.shape[1] / self.split_width_resolution)
58
+ self.num_h = math.ceil(composite_image.shape[0] / self.split_height_resolution)
59
+
60
+ self.split_start_point = []
61
+
62
+ "Split the image into several parts."
63
+ for i in range(self.num_h):
64
+ for j in range(self.num_w):
65
+ if i == composite_image.shape[0] // self.split_height_resolution:
66
+ if j == composite_image.shape[1] // self.split_width_resolution:
67
+ self.split_start_point.append((composite_image.shape[0] - self.split_height_resolution,
68
+ composite_image.shape[1] - self.split_width_resolution))
69
+ else:
70
+ self.split_start_point.append(
71
+ (composite_image.shape[0] - self.split_height_resolution, j * self.split_width_resolution))
72
+ else:
73
+ if j == composite_image.shape[1] // self.split_width_resolution:
74
+ self.split_start_point.append(
75
+ (i * self.split_height_resolution, composite_image.shape[1] - self.split_width_resolution))
76
+ else:
77
+ self.split_start_point.append(
78
+ (i * self.split_height_resolution, j * self.split_width_resolution))
79
+
80
+ assert len(self.split_start_point) == self.num_w * self.num_h
81
+
82
+ print(
83
+ f"The image will be split into {self.num_h} pieces in height, and {self.num_w} pieces in width. Totally {self.num_h * self.num_w} patches.")
84
+ print(f"The final resolution of each patch is {self.split_height_resolution} x {self.split_width_resolution}")
85
+
86
+ def __len__(self):
87
+ return self.num_w * self.num_h
88
+
89
+ def __getitem__(self, idx):
90
+ composite_image = self.composite_image
91
+
92
+ mask = self.mask
93
+
94
+ full_coord = prepare_cooridinate_input(mask).transpose(1, 2, 0)
95
+
96
+ tmp_transform = albumentations.Compose([Resize(self.opt.base_size, self.opt.base_size)],
97
+ additional_targets={'object_mask': 'image'})
98
+ transform_out = tmp_transform(image=self.composite_image, object_mask=self.mask)
99
+ compos_list = [self.torch_transforms(transform_out['image'])]
100
+ mask_list = [
101
+ torchvision.transforms.ToTensor()(transform_out['object_mask'][..., np.newaxis].astype(np.float32))]
102
+ coord_map_list = []
103
+
104
+ if composite_image.shape[0] != self.split_height_resolution:
105
+ c_h = self.split_start_point[idx][0] / (composite_image.shape[0] - self.split_height_resolution)
106
+ else:
107
+ c_h = 0
108
+ if composite_image.shape[1] != self.split_width_resolution:
109
+ c_w = self.split_start_point[idx][1] / (composite_image.shape[1] - self.split_width_resolution)
110
+ else:
111
+ c_w = 0
112
+ transform_out, c_h, c_w = customRandomCrop([composite_image, mask, full_coord],
113
+ self.split_height_resolution, self.split_width_resolution, c_h, c_w)
114
+
115
+ compos_list.append(self.torch_transforms(transform_out[0]))
116
+ mask_list.append(
117
+ torchvision.transforms.ToTensor()(transform_out[1][..., np.newaxis].astype(np.float32)))
118
+ coord_map_list.append(torchvision.transforms.ToTensor()(transform_out[2]))
119
+ coord_map_list.append(torchvision.transforms.ToTensor()(transform_out[2]))
120
+ for n in range(2):
121
+ tmp_comp = cv2.resize(composite_image, (
122
+ composite_image.shape[1] // 2 ** (n + 1), composite_image.shape[0] // 2 ** (n + 1)))
123
+ tmp_mask = cv2.resize(mask, (mask.shape[1] // 2 ** (n + 1), mask.shape[0] // 2 ** (n + 1)))
124
+ tmp_coord = prepare_cooridinate_input(tmp_mask).transpose(1, 2, 0)
125
+
126
+ transform_out, c_h, c_w = customRandomCrop([tmp_comp, tmp_mask, tmp_coord],
127
+ self.split_height_resolution // 2 ** (n + 1),
128
+ self.split_width_resolution // 2 ** (n + 1), c_h, c_w)
129
+ compos_list.append(self.torch_transforms(transform_out[0]))
130
+ mask_list.append(
131
+ torchvision.transforms.ToTensor()(transform_out[1][..., np.newaxis].astype(np.float32)))
132
+ coord_map_list.append(torchvision.transforms.ToTensor()(transform_out[2]))
133
+ out_comp = compos_list
134
+ out_mask = mask_list
135
+ out_coord = coord_map_list
136
+
137
+ fg_INR_coordinates, bg_INR_coordinates, fg_INR_RGB, fg_transfer_INR_RGB, bg_INR_RGB = self.INR_dataset.generator(
138
+ self.torch_transforms, transform_out[0], transform_out[0], mask)
139
+
140
+ return {
141
+ 'composite_image': out_comp,
142
+ 'mask': out_mask,
143
+ 'coordinate_map': out_coord,
144
+ 'composite_image0': out_comp[0],
145
+ 'mask0': out_mask[0],
146
+ 'coordinate_map0': out_coord[0],
147
+ 'composite_image1': out_comp[1],
148
+ 'mask1': out_mask[1],
149
+ 'coordinate_map1': out_coord[1],
150
+ 'composite_image2': out_comp[2],
151
+ 'mask2': out_mask[2],
152
+ 'coordinate_map2': out_coord[2],
153
+ 'composite_image3': out_comp[3],
154
+ 'mask3': out_mask[3],
155
+ 'coordinate_map3': out_coord[3],
156
+ 'fg_INR_coordinates': fg_INR_coordinates,
157
+ 'bg_INR_coordinates': bg_INR_coordinates,
158
+ 'fg_INR_RGB': fg_INR_RGB,
159
+ 'fg_transfer_INR_RGB': fg_transfer_INR_RGB,
160
+ 'bg_INR_RGB': bg_INR_RGB,
161
+ 'start_point': self.split_start_point[idx],
162
+ 'start_proportion': [self.split_start_point[idx][0] / (composite_image.shape[0]),
163
+ self.split_start_point[idx][1] / (composite_image.shape[1]),
164
+ (self.split_start_point[idx][0] + self.split_height_resolution) / (
165
+ composite_image.shape[0]),
166
+ (self.split_start_point[idx][1] + self.split_width_resolution) / (
167
+ composite_image.shape[1])],
168
+ }
169
+
170
+
171
+ def parse_args():
172
+ parser = argparse.ArgumentParser()
173
+
174
+ parser.add_argument('--split_resolution', type=int, default=2048,
175
+ help='The resolution of the patch split.')
176
+
177
+ parser.add_argument('--composite_image', type=str, default=r'./demo/demo_2k_composite.jpg',
178
+ help='composite image path')
179
+
180
+ parser.add_argument('--mask', type=str, default=r'./demo/demo_2k_mask.jpg',
181
+ help='mask path')
182
+
183
+ parser.add_argument('--save_path', type=str, default=r'./demo/',
184
+ help='save path')
185
+
186
+ parser.add_argument('--workers', type=int, default=8,
187
+ metavar='N', help='Dataloader threads.')
188
+
189
+ parser.add_argument('--batch_size', type=int, default=1,
190
+ help='You can override model batch size by specify positive number.')
191
+
192
+ parser.add_argument('--device', type=str, default='cuda',
193
+ help="Whether use cuda, 'cuda' or 'cpu'.")
194
+
195
+ parser.add_argument('--base_size', type=int, default=256,
196
+ help='Base size. Resolution of the image input into the Encoder')
197
+
198
+ parser.add_argument('--input_size', type=int, default=256,
199
+ help='Input size. Resolution of the image that want to be generated by the Decoder')
200
+
201
+ parser.add_argument('--INR_input_size', type=int, default=256,
202
+ help='INR input size. Resolution of the image that want to be generated by the Decoder. '
203
+ 'Should be the same as `input_size`')
204
+
205
+ parser.add_argument('--INR_MLP_dim', type=int, default=32,
206
+ help='Number of channels for INR linear layer.')
207
+
208
+ parser.add_argument('--LUT_dim', type=int, default=7,
209
+ help='Dim of the output LUT. Refer to https://ieeexplore.ieee.org/abstract/document/9206076')
210
+
211
+ parser.add_argument('--activation', type=str, default='leakyrelu_pe',
212
+ help='INR activation layer type: leakyrelu_pe, sine')
213
+
214
+ parser.add_argument('--pretrained', type=str,
215
+ default=r'.\pretrained_models\Resolution_RAW_iHarmony4.pth',
216
+ help='Pretrained weight path')
217
+
218
+ parser.add_argument('--param_factorize_dim', type=int,
219
+ default=10,
220
+ help='The intermediate dimensions of the factorization of the predicted MLP parameters. '
221
+ 'Refer to https://arxiv.org/abs/2011.12026')
222
+
223
+ parser.add_argument('--embedding_type', type=str,
224
+ default="CIPS_embed",
225
+ help='Which embedding_type to use.')
226
+
227
+ parser.add_argument('--INRDecode', action="store_false",
228
+ help='Whether INR decoder. Set it to False if you want to test the baseline '
229
+ '(https://github.com/SamsungLabs/image_harmonization)')
230
+
231
+ parser.add_argument('--isMoreINRInput', action="store_false",
232
+ help='Whether to cat RGB and mask. See Section 3.4 in the paper.')
233
+
234
+ parser.add_argument('--hr_train', action="store_false",
235
+ help='Whether use hr_train. See section 3.4 in the paper.')
236
+
237
+ parser.add_argument('--isFullRes', action="store_true",
238
+ help='Whether for original resolution. See section 3.4 in the paper.')
239
+
240
+ opt = parser.parse_args()
241
+
242
+ assert opt.batch_size == 1, 'This faster script only supports batch size 1 for inference.'
243
+
244
+ return opt
245
+
246
+
247
+ @torch.no_grad()
248
+ def inference(model, opt, composite_image=None, mask=None):
249
+ model.eval()
250
+
251
+ "dataset here is actually consisted of several patches of a single image."
252
+ singledataset = single_image_dataset(opt, composite_image, mask)
253
+
254
+ single_data_loader = DataLoader(singledataset, opt.batch_size, shuffle=False, drop_last=False, pin_memory=True,
255
+ num_workers=opt.workers, persistent_workers=False if composite_image is not None else True)
256
+
257
+ "Init a pure black image with the same size as the input image."
258
+ init_img = np.zeros_like(singledataset.composite_image)
259
+
260
+ time_all = 0
261
+
262
+ for step, batch in tqdm.tqdm(enumerate(single_data_loader)):
263
+ composite_image = [batch[f'composite_image{name}'].to(opt.device) for name in range(4)]
264
+ mask = [batch[f'mask{name}'].to(opt.device) for name in range(4)]
265
+ coordinate_map = [batch[f'coordinate_map{name}'].to(opt.device) for name in range(4)]
266
+ start_points = batch['start_point']
267
+ start_proportion = batch['start_proportion']
268
+
269
+ if opt.batch_size == 1:
270
+ start_points = [torch.cat(start_points)]
271
+ start_proportion = [torch.cat(start_proportion)]
272
+
273
+ fg_INR_coordinates = coordinate_map[1:]
274
+
275
+ try:
276
+ if step == 0: # This is for CUDA Kernel Warm-up, or the first inference step will be quite slow.
277
+ fg_content_bg_appearance_construct, _, lut_transform_image = model(
278
+ composite_image,
279
+ mask,
280
+ fg_INR_coordinates, start_proportion[0]
281
+ )
282
+ if opt.device == "cuda":
283
+ torch.cuda.reset_max_memory_allocated()
284
+ torch.cuda.reset_max_memory_cached()
285
+ start_time = time.time()
286
+ torch.cuda.synchronize()
287
+ fg_content_bg_appearance_construct, _, lut_transform_image = model(
288
+ composite_image,
289
+ mask,
290
+ fg_INR_coordinates, start_proportion[0]
291
+ )
292
+ if opt.device == "cuda":
293
+ torch.cuda.synchronize()
294
+ end_time = time.time()
295
+
296
+ end_max_memory = torch.cuda.max_memory_allocated() // 1024 ** 2
297
+ end_memory = torch.cuda.memory_allocated() // 1024 ** 2
298
+
299
+ print(f'GPU max memory usage: {end_max_memory} MB')
300
+ print(f'GPU memory usage: {end_memory} MB')
301
+ time_all += (end_time - start_time)
302
+ print(f'progress: {step} / {len(single_data_loader)}')
303
+ except:
304
+ raise Exception(
305
+ f'The image resolution is large. Please reduce the `split_resolution` value. Your current set is {opt.split_resolution}')
306
+
307
+ "Assemble the every patch's harmonized result into the final whole image."
308
+ for id in range(len(fg_INR_coordinates[0])):
309
+ pred_fg_image = fg_content_bg_appearance_construct[-1][id]
310
+ pred_harmonized_image = pred_fg_image * (mask[1][id] > 100 / 255.) + composite_image[1][id] * (
311
+ ~(mask[1][id] > 100 / 255.))
312
+
313
+ pred_harmonized_tmp = cv2.cvtColor(
314
+ normalize(pred_harmonized_image.unsqueeze(0), opt, 'inv')[0].permute(1, 2, 0).cpu().mul_(255.).clamp_(
315
+ 0., 255.).numpy().astype(np.uint8), cv2.COLOR_RGB2BGR)
316
+
317
+ init_img[start_points[id][0]:start_points[id][0] + singledataset.split_height_resolution,
318
+ start_points[id][1]:start_points[id][1] + singledataset.split_width_resolution] = pred_harmonized_tmp
319
+
320
+ print(f'Inference time: {time_all}')
321
+ if opt.save_path is not None:
322
+ os.makedirs(opt.save_path, exist_ok=True)
323
+ cv2.imwrite(os.path.join(opt.save_path, "pred_harmonized_image.jpg"), init_img)
324
+ return init_img
325
+
326
+
327
+ def main_process(opt, composite_image=None, mask=None):
328
+ cudnn.benchmark = True
329
+
330
+ model = build_model(opt).to(opt.device)
331
+
332
+ load_dict = torch.load(opt.pretrained)['model']
333
+ for k in load_dict.keys():
334
+ if k not in model.state_dict().keys():
335
+ print(f"Skip {k}")
336
+ model.load_state_dict(load_dict, strict=False)
337
+
338
+ return inference(model, opt, composite_image, mask)
339
+
340
+
341
+ if __name__ == '__main__':
342
+ opt = parse_args()
343
+ opt.transform_mean = [.5, .5, .5]
344
+ opt.transform_var = [.5, .5, .5]
345
+ main_process(opt)
inference.py ADDED
@@ -0,0 +1,236 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import argparse
3
+
4
+ import albumentations
5
+ from albumentations import Resize
6
+
7
+ import torch
8
+ import torch.backends.cudnn as cudnn
9
+ import torchvision.transforms as transforms
10
+ from torch.utils.data import DataLoader
11
+
12
+ from model.build_model import build_model
13
+ from datasets.build_dataset import dataset_generator
14
+
15
+ from utils import misc, metrics
16
+
17
+
18
+ def parse_args():
19
+ parser = argparse.ArgumentParser()
20
+
21
+ parser.add_argument('--workers', type=int, default=1,
22
+ metavar='N', help='Dataloader threads.')
23
+
24
+ parser.add_argument('--batch_size', type=int, default=1,
25
+ help='You can override model batch size by specify positive number.')
26
+
27
+ parser.add_argument('--device', type=str, default='cuda',
28
+ help="Whether use cuda, 'cuda' or 'cpu'.")
29
+
30
+ parser.add_argument('--save_path', type=str, default="./logs",
31
+ help='Where to save logs and checkpoints.')
32
+
33
+ parser.add_argument('--dataset_path', type=str, default=r".\iHarmony4",
34
+ help='Dataset path.')
35
+
36
+ parser.add_argument('--base_size', type=int, default=256,
37
+ help='Base size. Resolution of the image input into the Encoder')
38
+
39
+ parser.add_argument('--input_size', type=int, default=256,
40
+ help='Input size. Resolution of the image that want to be generated by the Decoder')
41
+
42
+ parser.add_argument('--INR_input_size', type=int, default=256,
43
+ help='INR input size. Resolution of the image that want to be generated by the Decoder. '
44
+ 'Should be the same as `input_size`')
45
+
46
+ parser.add_argument('--INR_MLP_dim', type=int, default=32,
47
+ help='Number of channels for INR linear layer.')
48
+
49
+ parser.add_argument('--LUT_dim', type=int, default=7,
50
+ help='Dim of the output LUT. Refer to https://ieeexplore.ieee.org/abstract/document/9206076')
51
+
52
+ parser.add_argument('--activation', type=str, default='leakyrelu_pe',
53
+ help='INR activation layer type: leakyrelu_pe, sine')
54
+
55
+ parser.add_argument('--pretrained', type=str,
56
+ default=r'.\pretrained_models\Resolution_RAW_iHarmony4.pth',
57
+ help='Pretrained weight path')
58
+
59
+ parser.add_argument('--param_factorize_dim', type=int,
60
+ default=10,
61
+ help='The intermediate dimensions of the factorization of the predicted MLP parameters. '
62
+ 'Refer to https://arxiv.org/abs/2011.12026')
63
+
64
+ parser.add_argument('--embedding_type', type=str,
65
+ default="CIPS_embed",
66
+ help='Which embedding_type to use.')
67
+
68
+ parser.add_argument('--optim', type=str,
69
+ default='adamw',
70
+ help='Which optimizer to use.')
71
+
72
+ parser.add_argument('--INRDecode', action="store_false",
73
+ help='Whether INR decoder. Set it to False if you want to test the baseline '
74
+ '(https://github.com/SamsungLabs/image_harmonization)')
75
+
76
+ parser.add_argument('--isMoreINRInput', action="store_false",
77
+ help='Whether to cat RGB and mask. See Section 3.4 in the paper.')
78
+
79
+ parser.add_argument('--hr_train', action="store_true",
80
+ help='Whether use hr_train. See section 3.4 in the paper.')
81
+
82
+ parser.add_argument('--isFullRes', action="store_true",
83
+ help='Whether for original resolution. See section 3.4 in the paper.')
84
+
85
+ opt = parser.parse_args()
86
+
87
+ opt.save_path = misc.increment_path(os.path.join(opt.save_path, "test1"))
88
+
89
+ return opt
90
+
91
+
92
+ def inference(val_loader, model, logger, opt):
93
+ current_process = 10
94
+ model.eval()
95
+
96
+ metric_log = {
97
+ 'HAdobe5k': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
98
+ 'HCOCO': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
99
+ 'Hday2night': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
100
+ 'HFlickr': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
101
+ 'All': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
102
+ }
103
+
104
+ lut_metric_log = {
105
+ 'HAdobe5k': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
106
+ 'HCOCO': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
107
+ 'Hday2night': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
108
+ 'HFlickr': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
109
+ 'All': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
110
+ }
111
+
112
+ for step, batch in enumerate(val_loader):
113
+ composite_image = batch['composite_image'].to(opt.device)
114
+ real_image = batch['real_image'].to(opt.device)
115
+ mask = batch['mask'].to(opt.device)
116
+ category = batch['category']
117
+
118
+ fg_INR_coordinates = batch['fg_INR_coordinates'].to(opt.device)
119
+
120
+ with torch.no_grad():
121
+ fg_content_bg_appearance_construct, _, lut_transform_image = model(
122
+ composite_image,
123
+ mask,
124
+ fg_INR_coordinates,
125
+ )
126
+
127
+ if opt.INRDecode:
128
+ pred_fg_image = fg_content_bg_appearance_construct[-1]
129
+ else:
130
+ pred_fg_image = misc.lin2img(fg_content_bg_appearance_construct,
131
+ val_loader.dataset.INR_dataset.size) if fg_content_bg_appearance_construct is not None else None
132
+
133
+ if not opt.INRDecode:
134
+ pred_harmonized_image = None
135
+ else:
136
+ pred_harmonized_image = pred_fg_image * (mask > 100 / 255.) + real_image * (~(mask > 100 / 255.))
137
+ lut_transform_image = lut_transform_image * (mask > 100 / 255.) + real_image * (~(mask > 100 / 255.))
138
+
139
+ misc.visualize(real_image, composite_image, mask, pred_fg_image,
140
+ pred_harmonized_image, lut_transform_image, opt, -1, show=False,
141
+ wandb=False, isAll=True, step=step)
142
+
143
+ if opt.INRDecode:
144
+ mse, fmse, psnr, ssim = metrics.calc_metrics(misc.normalize(pred_harmonized_image, opt, 'inv'),
145
+ misc.normalize(real_image, opt, 'inv'), mask)
146
+
147
+ lut_mse, lut_fmse, lut_psnr, lut_ssim = metrics.calc_metrics(misc.normalize(lut_transform_image, opt, 'inv'),
148
+ misc.normalize(real_image, opt, 'inv'), mask)
149
+
150
+ for idx in range(len(category)):
151
+ if opt.INRDecode:
152
+ metric_log[category[idx]]['Samples'] += 1
153
+ metric_log[category[idx]]['MSE'] += mse[idx]
154
+ metric_log[category[idx]]['fMSE'] += fmse[idx]
155
+ metric_log[category[idx]]['PSNR'] += psnr[idx]
156
+ metric_log[category[idx]]['SSIM'] += ssim[idx]
157
+
158
+ metric_log['All']['Samples'] += 1
159
+ metric_log['All']['MSE'] += mse[idx]
160
+ metric_log['All']['fMSE'] += fmse[idx]
161
+ metric_log['All']['PSNR'] += psnr[idx]
162
+ metric_log['All']['SSIM'] += ssim[idx]
163
+
164
+ lut_metric_log[category[idx]]['Samples'] += 1
165
+ lut_metric_log[category[idx]]['MSE'] += lut_mse[idx]
166
+ lut_metric_log[category[idx]]['fMSE'] += lut_fmse[idx]
167
+ lut_metric_log[category[idx]]['PSNR'] += lut_psnr[idx]
168
+ lut_metric_log[category[idx]]['SSIM'] += lut_ssim[idx]
169
+
170
+ lut_metric_log['All']['Samples'] += 1
171
+ lut_metric_log['All']['MSE'] += lut_mse[idx]
172
+ lut_metric_log['All']['fMSE'] += lut_fmse[idx]
173
+ lut_metric_log['All']['PSNR'] += lut_psnr[idx]
174
+ lut_metric_log['All']['SSIM'] += lut_ssim[idx]
175
+
176
+ if (step + 1) / len(val_loader) * 100 >= current_process:
177
+ logger.info(f'Processing: {current_process}')
178
+ current_process += 10
179
+
180
+ logger.info('=========================')
181
+ for key in metric_log.keys():
182
+ if opt.INRDecode:
183
+ msg = f"{key}-'MSE': {metric_log[key]['MSE'] / metric_log[key]['Samples']:.2f}\n" \
184
+ f"{key}-'fMSE': {metric_log[key]['fMSE'] / metric_log[key]['Samples']:.2f}\n" \
185
+ f"{key}-'PSNR': {metric_log[key]['PSNR'] / metric_log[key]['Samples']:.2f}\n" \
186
+ f"{key}-'SSIM': {metric_log[key]['SSIM'] / metric_log[key]['Samples']:.4f}\n" \
187
+ f"{key}-'LUT_MSE': {lut_metric_log[key]['MSE'] / lut_metric_log[key]['Samples']:.2f}\n" \
188
+ f"{key}-'LUT_fMSE': {lut_metric_log[key]['fMSE'] / lut_metric_log[key]['Samples']:.2f}\n" \
189
+ f"{key}-'LUT_PSNR': {lut_metric_log[key]['PSNR'] / lut_metric_log[key]['Samples']:.2f}\n" \
190
+ f"{key}-'LUT_SSIM': {lut_metric_log[key]['SSIM'] / lut_metric_log[key]['Samples']:.4f}\n"
191
+ else:
192
+ msg = f"{key}-'LUT_MSE': {lut_metric_log[key]['MSE'] / lut_metric_log[key]['Samples']:.2f}\n" \
193
+ f"{key}-'LUT_fMSE': {lut_metric_log[key]['fMSE'] / lut_metric_log[key]['Samples']:.2f}\n" \
194
+ f"{key}-'LUT_PSNR': {lut_metric_log[key]['PSNR'] / lut_metric_log[key]['Samples']:.2f}\n" \
195
+ f"{key}-'LUT_SSIM': {lut_metric_log[key]['SSIM'] / lut_metric_log[key]['Samples']:.4f}\n"
196
+
197
+ logger.info(msg)
198
+
199
+ logger.info('=========================')
200
+
201
+
202
+ def main_process(opt):
203
+ logger = misc.create_logger(os.path.join(opt.save_path, "log.txt"))
204
+ cudnn.benchmark = True
205
+
206
+ valset_path = os.path.join(opt.dataset_path, "IHD_test.txt")
207
+
208
+ opt.transform_mean = [.5, .5, .5]
209
+ opt.transform_var = [.5, .5, .5]
210
+ torch_transform = transforms.Compose([transforms.ToTensor(),
211
+ transforms.Normalize(opt.transform_mean, opt.transform_var)])
212
+
213
+ valset_alb_transform = albumentations.Compose([Resize(opt.input_size, opt.input_size)],
214
+ additional_targets={'real_image': 'image', 'object_mask': 'image'})
215
+
216
+ valset = dataset_generator(valset_path, valset_alb_transform, torch_transform, opt, mode='Val')
217
+
218
+ val_loader = DataLoader(valset, opt.batch_size, shuffle=False, drop_last=False, pin_memory=True,
219
+ num_workers=opt.workers, persistent_workers=True)
220
+
221
+ model = build_model(opt).to(opt.device)
222
+ logger.info(f"Load pretrained weight from {opt.pretrained}")
223
+
224
+ load_dict = torch.load(opt.pretrained)['model']
225
+ for k in load_dict.keys():
226
+ if k not in model.state_dict().keys():
227
+ print(f"Skip {k}")
228
+ model.load_state_dict(load_dict, strict=False)
229
+
230
+ inference(val_loader, model, logger, opt)
231
+
232
+
233
+ if __name__ == '__main__':
234
+ opt = parse_args()
235
+ os.makedirs(opt.save_path, exist_ok=True)
236
+ main_process(opt)
inference_for_arbitrary_resolution_image.py ADDED
@@ -0,0 +1,337 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+
3
+ import torch.backends.cudnn as cudnn
4
+ import torchvision.transforms as transforms
5
+ from torch.utils.data import DataLoader
6
+
7
+ from model.build_model import build_model
8
+
9
+ import torch
10
+ import cv2
11
+ import numpy as np
12
+ import torchvision
13
+ import os
14
+ import tqdm
15
+ import time
16
+
17
+ from utils.misc import prepare_cooridinate_input, customRandomCrop
18
+
19
+ from datasets.build_INR_dataset import Implicit2DGenerator
20
+ import albumentations
21
+ from albumentations import Resize
22
+ from torch.utils.data import DataLoader
23
+ from utils.misc import normalize
24
+
25
+ import math
26
+
27
+
28
+ class single_image_dataset(torch.utils.data.Dataset):
29
+ def __init__(self, opt, composite_image=None, mask=None):
30
+ super().__init__()
31
+
32
+ self.opt = opt
33
+
34
+ if composite_image is None:
35
+ composite_image = cv2.imread(opt.composite_image)
36
+ composite_image = cv2.cvtColor(composite_image, cv2.COLOR_BGR2RGB)
37
+ self.composite_image = composite_image
38
+
39
+ if mask is None:
40
+ mask = cv2.imread(opt.mask)
41
+ mask = mask[:, :, 0].astype(np.float32) / 255.
42
+ self.mask = mask
43
+
44
+ self.torch_transforms = transforms.Compose([transforms.ToTensor(),
45
+ transforms.Normalize([.5, .5, .5], [.5, .5, .5])])
46
+ self.INR_dataset = Implicit2DGenerator(opt, 'Val')
47
+
48
+ self.split_width_resolution = composite_image.shape[1] // opt.split_num
49
+ self.split_height_resolution = composite_image.shape[0] // opt.split_num
50
+
51
+ self.split_width_resolution = self.split_height_resolution = min(self.split_width_resolution,
52
+ self.split_height_resolution)
53
+
54
+ if self.split_width_resolution % 4 != 0:
55
+ self.split_width_resolution = self.split_width_resolution + (4 - self.split_width_resolution % 4)
56
+
57
+ if self.split_height_resolution % 4 != 0:
58
+ self.split_height_resolution = self.split_height_resolution + (4 - self.split_height_resolution % 4)
59
+
60
+ self.num_w = math.ceil(composite_image.shape[1] / self.split_width_resolution)
61
+ self.num_h = math.ceil(composite_image.shape[0] / self.split_height_resolution)
62
+
63
+ self.split_start_point = []
64
+
65
+ "Split the image into several parts."
66
+ for i in range(self.num_h):
67
+ for j in range(self.num_w):
68
+ if i == composite_image.shape[0] // self.split_height_resolution:
69
+ if j == composite_image.shape[1] // self.split_width_resolution:
70
+ self.split_start_point.append((composite_image.shape[0] - self.split_height_resolution,
71
+ composite_image.shape[1] - self.split_width_resolution))
72
+ else:
73
+ self.split_start_point.append(
74
+ (composite_image.shape[0] - self.split_height_resolution, j * self.split_width_resolution))
75
+ else:
76
+ if j == composite_image.shape[1] // self.split_width_resolution:
77
+ self.split_start_point.append(
78
+ (i * self.split_height_resolution, composite_image.shape[1] - self.split_width_resolution))
79
+ else:
80
+ self.split_start_point.append(
81
+ (i * self.split_height_resolution, j * self.split_width_resolution))
82
+
83
+ assert len(self.split_start_point) == self.num_w * self.num_h
84
+
85
+ print(
86
+ f"The image will be split into {self.num_h} pieces in height, and {self.num_w} pieces in width. Totally {self.num_h * self.num_w} patches.")
87
+ print(f"The final resolution of each patch is {self.split_height_resolution} x {self.split_width_resolution}")
88
+
89
+ def __len__(self):
90
+ return self.num_w * self.num_h
91
+
92
+ def __getitem__(self, idx):
93
+ composite_image = self.composite_image
94
+
95
+ mask = self.mask
96
+
97
+ full_coord = prepare_cooridinate_input(mask).transpose(1, 2, 0)
98
+
99
+ tmp_transform = albumentations.Compose([Resize(self.opt.base_size, self.opt.base_size)],
100
+ additional_targets={'object_mask': 'image'})
101
+ transform_out = tmp_transform(image=composite_image, object_mask=mask)
102
+ compos_list = [self.torch_transforms(transform_out['image'])]
103
+ mask_list = [
104
+ torchvision.transforms.ToTensor()(transform_out['object_mask'][..., np.newaxis].astype(np.float32))]
105
+ coord_map_list = []
106
+
107
+ if composite_image.shape[0] != self.split_height_resolution:
108
+ c_h = self.split_start_point[idx][0] / (composite_image.shape[0] - self.split_height_resolution)
109
+ else:
110
+ c_h = 0
111
+ if composite_image.shape[1] != self.split_width_resolution:
112
+ c_w = self.split_start_point[idx][1] / (composite_image.shape[1] - self.split_width_resolution)
113
+ else:
114
+ c_w = 0
115
+ transform_out, c_h, c_w = customRandomCrop([composite_image, mask, full_coord],
116
+ self.split_height_resolution, self.split_width_resolution, c_h, c_w)
117
+
118
+ compos_list.append(self.torch_transforms(transform_out[0]))
119
+ mask_list.append(
120
+ torchvision.transforms.ToTensor()(transform_out[1][..., np.newaxis].astype(np.float32)))
121
+ coord_map_list.append(torchvision.transforms.ToTensor()(transform_out[2]))
122
+ coord_map_list.append(torchvision.transforms.ToTensor()(transform_out[2]))
123
+ for n in range(2):
124
+ tmp_comp = cv2.resize(composite_image, (
125
+ composite_image.shape[1] // 2 ** (n + 1), composite_image.shape[0] // 2 ** (n + 1)))
126
+ tmp_mask = cv2.resize(mask, (mask.shape[1] // 2 ** (n + 1), mask.shape[0] // 2 ** (n + 1)))
127
+ tmp_coord = prepare_cooridinate_input(tmp_mask).transpose(1, 2, 0)
128
+
129
+ transform_out, c_h, c_w = customRandomCrop([tmp_comp, tmp_mask, tmp_coord],
130
+ self.split_height_resolution // 2 ** (n + 1),
131
+ self.split_width_resolution // 2 ** (n + 1), c_h, c_w)
132
+ compos_list.append(self.torch_transforms(transform_out[0]))
133
+ mask_list.append(
134
+ torchvision.transforms.ToTensor()(transform_out[1][..., np.newaxis].astype(np.float32)))
135
+ coord_map_list.append(torchvision.transforms.ToTensor()(transform_out[2]))
136
+ out_comp = compos_list
137
+ out_mask = mask_list
138
+ out_coord = coord_map_list
139
+
140
+ fg_INR_coordinates, bg_INR_coordinates, fg_INR_RGB, fg_transfer_INR_RGB, bg_INR_RGB = self.INR_dataset.generator(
141
+ self.torch_transforms, transform_out[0], transform_out[0], mask)
142
+
143
+ return {
144
+ 'composite_image': out_comp,
145
+ 'mask': out_mask,
146
+ 'coordinate_map': out_coord,
147
+ 'composite_image0': out_comp[0],
148
+ 'mask0': out_mask[0],
149
+ 'coordinate_map0': out_coord[0],
150
+ 'composite_image1': out_comp[1],
151
+ 'mask1': out_mask[1],
152
+ 'coordinate_map1': out_coord[1],
153
+ 'composite_image2': out_comp[2],
154
+ 'mask2': out_mask[2],
155
+ 'coordinate_map2': out_coord[2],
156
+ 'composite_image3': out_comp[3],
157
+ 'mask3': out_mask[3],
158
+ 'coordinate_map3': out_coord[3],
159
+ 'fg_INR_coordinates': fg_INR_coordinates,
160
+ 'bg_INR_coordinates': bg_INR_coordinates,
161
+ 'fg_INR_RGB': fg_INR_RGB,
162
+ 'fg_transfer_INR_RGB': fg_transfer_INR_RGB,
163
+ 'bg_INR_RGB': bg_INR_RGB,
164
+ 'start_point': self.split_start_point[idx],
165
+ }
166
+
167
+
168
+ def parse_args():
169
+ parser = argparse.ArgumentParser()
170
+
171
+ parser.add_argument('--split_num', type=int, default=4,
172
+ help='How many pieces do you want to split an image width / height.')
173
+
174
+ parser.add_argument('--composite_image', type=str, default=r'./demo/demo_2k_composite.jpg',
175
+ help='composite image path')
176
+
177
+ parser.add_argument('--mask', type=str, default=r'./demo/demo_2k_mask.jpg',
178
+ help='mask path')
179
+
180
+ parser.add_argument('--save_path', type=str, default=r'./demo/',
181
+ help='save path')
182
+
183
+ parser.add_argument('--workers', type=int, default=8,
184
+ metavar='N', help='Dataloader threads.')
185
+
186
+ parser.add_argument('--batch_size', type=int, default=1,
187
+ help='You can override model batch size by specify positive number.')
188
+
189
+ parser.add_argument('--device', type=str, default='cuda',
190
+ help="Whether use cuda, 'cuda' or 'cpu'.")
191
+
192
+ parser.add_argument('--base_size', type=int, default=256,
193
+ help='Base size. Resolution of the image input into the Encoder')
194
+
195
+ parser.add_argument('--input_size', type=int, default=256,
196
+ help='Input size. Resolution of the image that want to be generated by the Decoder')
197
+
198
+ parser.add_argument('--INR_input_size', type=int, default=256,
199
+ help='INR input size. Resolution of the image that want to be generated by the Decoder. '
200
+ 'Should be the same as `input_size`')
201
+
202
+ parser.add_argument('--INR_MLP_dim', type=int, default=32,
203
+ help='Number of channels for INR linear layer.')
204
+
205
+ parser.add_argument('--LUT_dim', type=int, default=7,
206
+ help='Dim of the output LUT. Refer to https://ieeexplore.ieee.org/abstract/document/9206076')
207
+
208
+ parser.add_argument('--activation', type=str, default='leakyrelu_pe',
209
+ help='INR activation layer type: leakyrelu_pe, sine')
210
+
211
+ parser.add_argument('--pretrained', type=str,
212
+ default=r'.\pretrained_models\Resolution_RAW_iHarmony4.pth',
213
+ help='Pretrained weight path')
214
+
215
+ parser.add_argument('--param_factorize_dim', type=int,
216
+ default=10,
217
+ help='The intermediate dimensions of the factorization of the predicted MLP parameters. '
218
+ 'Refer to https://arxiv.org/abs/2011.12026')
219
+
220
+ parser.add_argument('--embedding_type', type=str,
221
+ default="CIPS_embed",
222
+ help='Which embedding_type to use.')
223
+
224
+ parser.add_argument('--INRDecode', action="store_false",
225
+ help='Whether INR decoder. Set it to False if you want to test the baseline '
226
+ '(https://github.com/SamsungLabs/image_harmonization)')
227
+
228
+ parser.add_argument('--isMoreINRInput', action="store_false",
229
+ help='Whether to cat RGB and mask. See Section 3.4 in the paper.')
230
+
231
+ parser.add_argument('--hr_train', action="store_false",
232
+ help='Whether use hr_train. See section 3.4 in the paper.')
233
+
234
+ parser.add_argument('--isFullRes', action="store_true",
235
+ help='Whether for original resolution. See section 3.4 in the paper.')
236
+
237
+ opt = parser.parse_args()
238
+
239
+ return opt
240
+
241
+ @torch.no_grad()
242
+ def inference(model, opt, composite_image=None, mask=None):
243
+ model.eval()
244
+
245
+ "dataset here is actually consisted of several patches of a single image."
246
+ singledataset = single_image_dataset(opt, composite_image, mask)
247
+
248
+ single_data_loader = DataLoader(singledataset, opt.batch_size, shuffle=False, drop_last=False, pin_memory=True,
249
+ num_workers=opt.workers, persistent_workers=False if composite_image is not None else True)
250
+
251
+ "Init a pure black image with the same size as the input image."
252
+ init_img = np.zeros_like(singledataset.composite_image)
253
+
254
+ time_all = 0
255
+
256
+ for step, batch in tqdm.tqdm(enumerate(single_data_loader)):
257
+ composite_image = [batch[f'composite_image{name}'].to(opt.device) for name in range(4)]
258
+ mask = [batch[f'mask{name}'].to(opt.device) for name in range(4)]
259
+ coordinate_map = [batch[f'coordinate_map{name}'].to(opt.device) for name in range(4)]
260
+ start_points = batch['start_point']
261
+
262
+ if opt.batch_size == 1:
263
+ start_points = [torch.cat(start_points)]
264
+
265
+ fg_INR_coordinates = coordinate_map[1:]
266
+
267
+ try:
268
+ if step == 0: # This is for CUDA Kernel Warm-up, or the first inference step will be quite slow.
269
+ fg_content_bg_appearance_construct, _, lut_transform_image = model(
270
+ composite_image,
271
+ mask,
272
+ fg_INR_coordinates,
273
+ )
274
+ if opt.device == "cuda":
275
+ torch.cuda.reset_max_memory_allocated()
276
+ torch.cuda.reset_max_memory_cached()
277
+ start_time = time.time()
278
+ torch.cuda.synchronize()
279
+ fg_content_bg_appearance_construct, _, lut_transform_image = model(
280
+ composite_image,
281
+ mask,
282
+ fg_INR_coordinates,
283
+ )
284
+ if opt.device == "cuda":
285
+ torch.cuda.synchronize()
286
+ end_time = time.time()
287
+
288
+ end_max_memory = torch.cuda.max_memory_allocated() // 1024 ** 2
289
+ end_memory = torch.cuda.memory_allocated() // 1024 ** 2
290
+
291
+ print(f'GPU max memory usage: {end_max_memory} MB')
292
+ print(f'GPU memory usage: {end_memory} MB')
293
+ time_all += (end_time - start_time)
294
+ print(f'progress: {step} / {len(single_data_loader)}')
295
+ except:
296
+ raise Exception(
297
+ f'The image resolution is large. Please increase the `split_num` value. Your current set is {opt.split_num}')
298
+
299
+ "Assemble the every patch's harmonized result into the final whole image."
300
+ for id in range(len(fg_INR_coordinates[0])):
301
+ pred_fg_image = fg_content_bg_appearance_construct[-1][id]
302
+ pred_harmonized_image = pred_fg_image * (mask[1][id] > 100 / 255.) + composite_image[1][id] * (
303
+ ~(mask[1][id] > 100 / 255.))
304
+
305
+ pred_harmonized_tmp = cv2.cvtColor(
306
+ normalize(pred_harmonized_image.unsqueeze(0), opt, 'inv')[0].permute(1, 2, 0).cpu().mul_(255.).clamp_(
307
+ 0., 255.).numpy().astype(np.uint8), cv2.COLOR_RGB2BGR)
308
+
309
+ init_img[start_points[id][0]:start_points[id][0] + singledataset.split_height_resolution,
310
+ start_points[id][1]:start_points[id][1] + singledataset.split_width_resolution] = pred_harmonized_tmp
311
+
312
+ print(f'Inference time: {time_all}')
313
+ if opt.save_path is not None:
314
+ os.makedirs(opt.save_path, exist_ok=True)
315
+ cv2.imwrite(os.path.join(opt.save_path, "pred_harmonized_image.jpg"), init_img)
316
+ return init_img
317
+
318
+
319
+ def main_process(opt, composite_image=None, mask=None):
320
+ cudnn.benchmark = True
321
+
322
+ model = build_model(opt).to(opt.device)
323
+
324
+ load_dict = torch.load(opt.pretrained)['model']
325
+ for k in load_dict.keys():
326
+ if k not in model.state_dict().keys():
327
+ print(f"Skip {k}")
328
+ model.load_state_dict(load_dict, strict=False)
329
+
330
+ return inference(model, opt, composite_image, mask)
331
+
332
+
333
+ if __name__ == '__main__':
334
+ opt = parse_args()
335
+ opt.transform_mean = [.5, .5, .5]
336
+ opt.transform_var = [.5, .5, .5]
337
+ main_process(opt)
processing.py ADDED
@@ -0,0 +1,308 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import time
3
+ import datetime
4
+
5
+ import torch
6
+ import torchvision
7
+
8
+ from utils import misc, metrics
9
+
10
+ best_psnr = 0
11
+
12
+
13
+ def train(train_loader, val_loader, model, optimizer, scheduler, loss_fn, logger, opt):
14
+ total_step = opt.epochs * len(train_loader)
15
+
16
+ step_time_log = misc.AverageMeter()
17
+ loss_log = misc.AverageMeter(':6f')
18
+ loss_fg_content_bg_appearance_construct_log = misc.AverageMeter(':6f')
19
+ loss_lut_transform_image_log = misc.AverageMeter(':6f')
20
+ loss_lut_regularize_log = misc.AverageMeter(':6f')
21
+
22
+ start_epoch = 0
23
+
24
+ "Load pretrained checkpoints"
25
+ if opt.pretrained is not None:
26
+ logger.info(f"Load pretrained weight from {opt.pretrained}")
27
+ load_state = torch.load(opt.pretrained)
28
+ model = model.cpu()
29
+ model.load_state_dict(load_state['model'])
30
+ model = model.to(opt.device)
31
+ optimizer.load_state_dict(load_state['optimizer'])
32
+ scheduler.load_state_dict(load_state['scheduler'])
33
+ start_epoch = load_state['last_epoch'] + 1
34
+
35
+ for epoch in range(start_epoch, opt.epochs):
36
+ model.train()
37
+ time_ckp = time.time()
38
+ for step, batch in enumerate(train_loader):
39
+ current_step = epoch * len(train_loader) + step + 1
40
+
41
+ if opt.INRDecode and opt.hr_train:
42
+ "List with 4 elements: [Input to Encoder, three different resolutions' crop to INR Decoder]"
43
+ composite_image = [batch[f'composite_image{name}'].to(opt.device) for name in range(4)]
44
+ real_image = [batch[f'real_image{name}'].to(opt.device) for name in range(4)]
45
+ mask = [batch[f'mask{name}'].to(opt.device) for name in range(4)]
46
+ coordinate_map = [batch[f'coordinate_map{name}'].to(opt.device) for name in range(4)]
47
+
48
+ fg_INR_coordinates = coordinate_map[1:]
49
+
50
+ else:
51
+ composite_image = batch['composite_image'].to(opt.device)
52
+ real_image = batch['real_image'].to(opt.device)
53
+ mask = batch['mask'].to(opt.device)
54
+
55
+ fg_INR_coordinates = batch['fg_INR_coordinates'].to(opt.device)
56
+
57
+ fg_content_bg_appearance_construct, fit_lut3d, lut_transform_image = model(
58
+ composite_image, mask, fg_INR_coordinates)
59
+
60
+ if opt.INRDecode:
61
+ loss_fg_content_bg_appearance_construct = 0
62
+ """
63
+ Our LRIP module requires three different resolution layers, thus here
64
+ `loss_fg_content_bg_appearance_construct` is calculated in multiple layers.
65
+ Besides, when leverage `hr_train`, i.e. use RSC strategy (See Section 3.4), the `real_image`
66
+ and `mask` are list type, corresponding different resolutions' crop.
67
+ """
68
+ if opt.hr_train:
69
+ for n in range(3):
70
+ loss_fg_content_bg_appearance_construct += loss_fn['masked_mse'] \
71
+ (fg_content_bg_appearance_construct[n], real_image[3 - n], mask[3 - n])
72
+ loss_fg_content_bg_appearance_construct /= 3
73
+ loss_lut_transform_image = loss_fn['masked_mse'](lut_transform_image, real_image[1], mask[1])
74
+ else:
75
+ for n in range(3):
76
+ loss_fg_content_bg_appearance_construct += loss_fn['MaskWeightedMSE'] \
77
+ (fg_content_bg_appearance_construct[n],
78
+ torchvision.transforms.Resize(opt.INR_input_size // 2 ** (3 - n - 1))(real_image),
79
+ torchvision.transforms.Resize(opt.INR_input_size // 2 ** (3 - n - 1))(mask))
80
+ loss_fg_content_bg_appearance_construct /= 3
81
+ loss_lut_transform_image = loss_fn['masked_mse'](lut_transform_image, real_image, mask)
82
+ loss_lut_regularize = loss_fn['regularize_LUT'](fit_lut3d)
83
+
84
+ else:
85
+ loss_fg_content_bg_appearance_construct = 0
86
+ loss_lut_transform_image = loss_fn['masked_mse'](lut_transform_image, real_image, mask)
87
+ loss_lut_regularize = 0
88
+
89
+ loss = loss_fg_content_bg_appearance_construct + loss_lut_transform_image + loss_lut_regularize
90
+ optimizer.zero_grad()
91
+ loss.backward()
92
+ optimizer.step()
93
+ scheduler.step()
94
+
95
+ step_time_log.update(time.time() - time_ckp)
96
+
97
+ loss_fg_content_bg_appearance_construct_log.update(0 if isinstance(loss_fg_content_bg_appearance_construct,
98
+ int) else loss_fg_content_bg_appearance_construct.item())
99
+ loss_lut_transform_image_log.update(
100
+ 0 if isinstance(loss_lut_transform_image, int) else loss_lut_transform_image.item())
101
+ loss_lut_regularize_log.update(0 if isinstance(loss_lut_regularize, int) else loss_lut_regularize.item())
102
+ loss_log.update(loss.item())
103
+
104
+ if current_step % opt.print_freq == 0:
105
+ remain_secs = (total_step - current_step) * step_time_log.avg
106
+ remain_time = datetime.timedelta(seconds=round(remain_secs))
107
+ finish_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(time.time() + remain_secs))
108
+
109
+ log_msg = f'Epoch: [{epoch}/{opt.epochs}]\t' \
110
+ f'Step: [{step}/{len(train_loader)}]\t' \
111
+ f'StepTime {step_time_log.val:.3f} ({step_time_log.avg:.3f})\t' \
112
+ f'lr {optimizer.param_groups[0]["lr"]}\t' \
113
+ f'Loss {loss_log.val:.4f} ({loss_log.avg:.4f})\t' \
114
+ f'Loss_fg_bg_cons {loss_fg_content_bg_appearance_construct_log.val:.4f} ({loss_fg_content_bg_appearance_construct_log.avg:.4f})\t' \
115
+ f'Loss_lut_trans {loss_lut_transform_image_log.val:.4f} ({loss_lut_transform_image_log.avg:.4f})\t' \
116
+ f'Loss_lut_reg {loss_lut_regularize_log.val:.4f} ({loss_lut_regularize_log.avg:.4f})\t' \
117
+ f'Remaining Time {remain_time} ({finish_time})'
118
+ logger.info(log_msg)
119
+
120
+ if opt.wandb:
121
+ import wandb
122
+ wandb.log(
123
+ {'Train/Epoch': epoch, 'Train/lr': optimizer.param_groups[0]['lr'], 'Train/Step': current_step,
124
+ 'Train/Loss': loss_log.val,
125
+ 'Train/Loss_fg_bg_cons': loss_fg_content_bg_appearance_construct_log.val,
126
+ 'Train/Loss_lut_trans': loss_lut_transform_image_log.val,
127
+ 'Train/Loss_lut_reg': loss_lut_regularize_log.val,
128
+ })
129
+
130
+ time_ckp = time.time()
131
+
132
+ state = {'model': model.state_dict(), 'optimizer': optimizer.state_dict(), 'last_epoch': epoch,
133
+ 'scheduler': scheduler.state_dict()}
134
+
135
+ """
136
+ As the validation of original resolution Harmonization will have no consistent resolution among images
137
+ (so fail to form a batch) and also may lead to out-of-memory problem when combined with training phase,
138
+ we here only save the model when `opt.isFullRes` is True, leaving the evaluation in `inference.py`.
139
+ """
140
+ if opt.isFullRes and opt.hr_train:
141
+ if epoch % 5 == 0:
142
+ torch.save(state, os.path.join(opt.save_path, f"epoch{epoch}.pth"))
143
+ else:
144
+ torch.save(state, os.path.join(opt.save_path, "last.pth"))
145
+ else:
146
+ val(val_loader, model, logger, opt, state)
147
+
148
+
149
+ def val(val_loader, model, logger, opt, state):
150
+ global best_psnr
151
+ current_process = 10
152
+ model.eval()
153
+
154
+ metric_log = {
155
+ 'HAdobe5k': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
156
+ 'HCOCO': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
157
+ 'Hday2night': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
158
+ 'HFlickr': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
159
+ 'All': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
160
+ }
161
+
162
+ lut_metric_log = {
163
+ 'HAdobe5k': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
164
+ 'HCOCO': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
165
+ 'Hday2night': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
166
+ 'HFlickr': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
167
+ 'All': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
168
+ }
169
+
170
+ for step, batch in enumerate(val_loader):
171
+ composite_image = batch['composite_image'].to(opt.device)
172
+ real_image = batch['real_image'].to(opt.device)
173
+ mask = batch['mask'].to(opt.device)
174
+ category = batch['category']
175
+
176
+ fg_INR_coordinates = batch['fg_INR_coordinates'].to(opt.device)
177
+ bg_INR_coordinates = batch['bg_INR_coordinates'].to(opt.device)
178
+ fg_transfer_INR_RGB = batch['fg_transfer_INR_RGB'].to(opt.device)
179
+
180
+ with torch.no_grad():
181
+ fg_content_bg_appearance_construct, _, lut_transform_image = model(
182
+ composite_image,
183
+ mask,
184
+ fg_INR_coordinates,
185
+ bg_INR_coordinates)
186
+ if opt.INRDecode:
187
+ pred_fg_image = fg_content_bg_appearance_construct[-1]
188
+ else:
189
+ pred_fg_image = None
190
+ fg_transfer_INR_RGB = misc.lin2img(fg_transfer_INR_RGB,
191
+ val_loader.dataset.INR_dataset.size) if fg_transfer_INR_RGB is not None else None
192
+
193
+ "For INR"
194
+ mask_INR = torchvision.transforms.Resize(opt.INR_input_size)(mask)
195
+
196
+ if not opt.INRDecode:
197
+ pred_harmonized_image = None
198
+ else:
199
+ pred_harmonized_image = pred_fg_image * (mask > 100 / 255.) + real_image * (~(mask > 100 / 255.))
200
+ lut_transform_image = lut_transform_image * (mask > 100 / 255.) + real_image * (~(mask > 100 / 255.))
201
+
202
+ "Save the output images. For every 10 epochs, save more results, otherwise, save little. Thus save storage."
203
+ if state['last_epoch'] % 10 == 0:
204
+ misc.visualize(real_image, composite_image, mask, pred_fg_image,
205
+ pred_harmonized_image, lut_transform_image, opt, state['last_epoch'], show=False,
206
+ wandb=opt.wandb, isAll=True, step=step)
207
+ elif step == 0:
208
+ misc.visualize(real_image, composite_image, mask, pred_fg_image,
209
+ pred_harmonized_image, lut_transform_image, opt, state['last_epoch'], show=False,
210
+ wandb=opt.wandb, step=step)
211
+
212
+ if opt.INRDecode:
213
+ mse, fmse, psnr, ssim = metrics.calc_metrics(misc.normalize(pred_harmonized_image, opt, 'inv'),
214
+ misc.normalize(fg_transfer_INR_RGB, opt, 'inv'), mask_INR)
215
+
216
+ lut_mse, lut_fmse, lut_psnr, lut_ssim = metrics.calc_metrics(misc.normalize(lut_transform_image, opt, 'inv'),
217
+ misc.normalize(real_image, opt, 'inv'), mask)
218
+
219
+ for idx in range(len(category)):
220
+ if opt.INRDecode:
221
+ metric_log[category[idx]]['Samples'] += 1
222
+ metric_log[category[idx]]['MSE'] += mse[idx]
223
+ metric_log[category[idx]]['fMSE'] += fmse[idx]
224
+ metric_log[category[idx]]['PSNR'] += psnr[idx]
225
+ metric_log[category[idx]]['SSIM'] += ssim[idx]
226
+
227
+ metric_log['All']['Samples'] += 1
228
+ metric_log['All']['MSE'] += mse[idx]
229
+ metric_log['All']['fMSE'] += fmse[idx]
230
+ metric_log['All']['PSNR'] += psnr[idx]
231
+ metric_log['All']['SSIM'] += ssim[idx]
232
+
233
+ lut_metric_log[category[idx]]['Samples'] += 1
234
+ lut_metric_log[category[idx]]['MSE'] += lut_mse[idx]
235
+ lut_metric_log[category[idx]]['fMSE'] += lut_fmse[idx]
236
+ lut_metric_log[category[idx]]['PSNR'] += lut_psnr[idx]
237
+ lut_metric_log[category[idx]]['SSIM'] += lut_ssim[idx]
238
+
239
+ lut_metric_log['All']['Samples'] += 1
240
+ lut_metric_log['All']['MSE'] += lut_mse[idx]
241
+ lut_metric_log['All']['fMSE'] += lut_fmse[idx]
242
+ lut_metric_log['All']['PSNR'] += lut_psnr[idx]
243
+ lut_metric_log['All']['SSIM'] += lut_ssim[idx]
244
+
245
+ if (step + 1) / len(val_loader) * 100 >= current_process:
246
+ logger.info(f'Processing: {current_process}')
247
+ current_process += 10
248
+
249
+ logger.info('=========================')
250
+ for key in metric_log.keys():
251
+ if opt.INRDecode:
252
+ msg = f"{key}-'MSE': {metric_log[key]['MSE'] / metric_log[key]['Samples']:.2f}\n" \
253
+ f"{key}-'fMSE': {metric_log[key]['fMSE'] / metric_log[key]['Samples']:.2f}\n" \
254
+ f"{key}-'PSNR': {metric_log[key]['PSNR'] / metric_log[key]['Samples']:.2f}\n" \
255
+ f"{key}-'SSIM': {metric_log[key]['SSIM'] / metric_log[key]['Samples']:.4f}\n" \
256
+ f"{key}-'LUT_MSE': {lut_metric_log[key]['MSE'] / lut_metric_log[key]['Samples']:.2f}\n" \
257
+ f"{key}-'LUT_fMSE': {lut_metric_log[key]['fMSE'] / lut_metric_log[key]['Samples']:.2f}\n" \
258
+ f"{key}-'LUT_PSNR': {lut_metric_log[key]['PSNR'] / lut_metric_log[key]['Samples']:.2f}\n" \
259
+ f"{key}-'LUT_SSIM': {lut_metric_log[key]['SSIM'] / lut_metric_log[key]['Samples']:.4f}\n"
260
+ else:
261
+ msg = f"{key}-'LUT_MSE': {lut_metric_log[key]['MSE'] / lut_metric_log[key]['Samples']:.2f}\n" \
262
+ f"{key}-'LUT_fMSE': {lut_metric_log[key]['fMSE'] / lut_metric_log[key]['Samples']:.2f}\n" \
263
+ f"{key}-'LUT_PSNR': {lut_metric_log[key]['PSNR'] / lut_metric_log[key]['Samples']:.2f}\n" \
264
+ f"{key}-'LUT_SSIM': {lut_metric_log[key]['SSIM'] / lut_metric_log[key]['Samples']:.4f}\n"
265
+
266
+ logger.info(msg)
267
+
268
+ if opt.wandb:
269
+ import wandb
270
+ if opt.INRDecode:
271
+ wandb.log(
272
+ {f'Val/{key}/Epoch': state['last_epoch'],
273
+ f'Val/{key}/MSE': metric_log[key]['MSE'] / metric_log[key]['Samples'],
274
+ f'Val/{key}/fMSE': metric_log[key]['fMSE'] / metric_log[key]['Samples'],
275
+ f'Val/{key}/PSNR': metric_log[key]['PSNR'] / metric_log[key]['Samples'],
276
+ f'Val/{key}/SSIM': metric_log[key]['SSIM'] / metric_log[key]['Samples'],
277
+ f'Val/{key}/LUT_MSE': lut_metric_log[key]['MSE'] / lut_metric_log[key]['Samples'],
278
+ f'Val/{key}/LUT_fMSE': lut_metric_log[key]['fMSE'] / lut_metric_log[key]['Samples'],
279
+ f'Val/{key}/LUT_PSNR': lut_metric_log[key]['PSNR'] / lut_metric_log[key]['Samples'],
280
+ f'Val/{key}/LUT_SSIM': lut_metric_log[key]['SSIM'] / lut_metric_log[key]['Samples']
281
+ })
282
+ else:
283
+ wandb.log(
284
+ {f'Val/{key}/Epoch': state['last_epoch'],
285
+ f'Val/{key}/LUT_MSE': lut_metric_log[key]['MSE'] / lut_metric_log[key]['Samples'],
286
+ f'Val/{key}/LUT_fMSE': lut_metric_log[key]['fMSE'] / lut_metric_log[key]['Samples'],
287
+ f'Val/{key}/LUT_PSNR': lut_metric_log[key]['PSNR'] / lut_metric_log[key]['Samples'],
288
+ f'Val/{key}/LUT_SSIM': lut_metric_log[key]['SSIM'] / lut_metric_log[key]['Samples']
289
+ })
290
+
291
+ logger.info('=========================')
292
+
293
+ if not opt.INRDecode:
294
+ if lut_metric_log['All']['PSNR'] / lut_metric_log['All']['Samples'] > best_psnr:
295
+ logger.info("Best Save!")
296
+ best_psnr = lut_metric_log['All']['PSNR'] / lut_metric_log['All']['Samples']
297
+ torch.save(state, os.path.join(opt.save_path, "best.pth"))
298
+ else:
299
+ logger.info("Last Save!")
300
+ torch.save(state, os.path.join(opt.save_path, "last.pth"))
301
+ else:
302
+ if metric_log['All']['PSNR'] / metric_log['All']['Samples'] > best_psnr:
303
+ logger.info("Best Save!")
304
+ best_psnr = metric_log['All']['PSNR'] / metric_log['All']['Samples']
305
+ torch.save(state, os.path.join(opt.save_path, "best.pth"))
306
+ else:
307
+ logger.info("Last Save!")
308
+ torch.save(state, os.path.join(opt.save_path, "last.pth"))
requirements.txt ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ adamp==0.3.0
2
+ albumentations==1.2.0
3
+ numpy==1.21.2
4
+ opencv_python==4.5.4.58
5
+ opencv_python_headless==4.6.0.66
6
+ pytorch_msssim==0.2.1
7
+ scikit_image==0.18.3
8
+ torch==1.12.0+cu113
9
+ torchvision==0.13.0+cu113
10
+ tqdm==4.62.2
11
+ wandb==0.12.21
train.py ADDED
@@ -0,0 +1,161 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import argparse
3
+
4
+ import albumentations
5
+ from albumentations import HorizontalFlip, Resize, RandomResizedCrop
6
+
7
+ import torch.backends.cudnn as cudnn
8
+ import torchvision.transforms as transforms
9
+ from torch.utils.data import DataLoader
10
+ from torch.optim import lr_scheduler
11
+
12
+ import processing
13
+ from utils import build_loss, misc
14
+ from model.build_model import build_model
15
+ from datasets.build_dataset import dataset_generator
16
+
17
+
18
+ def parse_args():
19
+ parser = argparse.ArgumentParser()
20
+
21
+ parser.add_argument('--workers', type=int, default=8,
22
+ metavar='N', help='Dataloader threads.')
23
+
24
+ parser.add_argument('--batch_size', type=int, default=16,
25
+ help='You can override model batch size by specify positive number.')
26
+
27
+ parser.add_argument('--device', type=str, default='cuda',
28
+ help="Whether use cuda, 'cuda' or 'cpu'.")
29
+
30
+ parser.add_argument('--epochs', type=int, default=60,
31
+ help='Epochs number.')
32
+
33
+ parser.add_argument('--lr', type=int, default=1e-4,
34
+ help='Learning rate.')
35
+
36
+ parser.add_argument('--save_path', type=str, default="./logs",
37
+ help='Where to save logs and checkpoints.')
38
+
39
+ parser.add_argument('--dataset_path', type=str, default=r".\iHarmony4",
40
+ help='Dataset path.')
41
+
42
+ parser.add_argument('--print_freq', type=int, default=100,
43
+ help='Number of iterations then print.')
44
+
45
+ parser.add_argument('--base_size', type=int, default=256,
46
+ help='Base size. Resolution of the image input into the Encoder')
47
+
48
+ parser.add_argument('--input_size', type=int, default=256,
49
+ help='Input size. Resolution of the image that want to be generated by the Decoder')
50
+
51
+ parser.add_argument('--INR_input_size', type=int, default=256,
52
+ help='INR input size. Resolution of the image that want to be generated by the Decoder. '
53
+ 'Should be the same as `input_size`')
54
+
55
+ parser.add_argument('--INR_MLP_dim', type=int, default=32,
56
+ help='Number of channels for INR linear layer.')
57
+
58
+ parser.add_argument('--LUT_dim', type=int, default=7,
59
+ help='Dim of the output LUT. Refer to https://ieeexplore.ieee.org/abstract/document/9206076')
60
+
61
+ parser.add_argument('--activation', type=str, default='leakyrelu_pe',
62
+ help='INR activation layer type: leakyrelu_pe, sine')
63
+
64
+ parser.add_argument('--pretrained', type=str,
65
+ default=None,
66
+ help='Pretrained weight path')
67
+
68
+ parser.add_argument('--param_factorize_dim', type=int,
69
+ default=10,
70
+ help='The intermediate dimensions of the factorization of the predicted MLP parameters. '
71
+ 'Refer to https://arxiv.org/abs/2011.12026')
72
+
73
+ parser.add_argument('--embedding_type', type=str,
74
+ default="CIPS_embed",
75
+ help='Which embedding_type to use.')
76
+
77
+ parser.add_argument('--optim', type=str,
78
+ default='adamw',
79
+ help='Which optimizer to use.')
80
+
81
+ parser.add_argument('--INRDecode', action="store_false",
82
+ help='Whether INR decoder. Set it to False if you want to test the baseline '
83
+ '(https://github.com/SamsungLabs/image_harmonization)')
84
+
85
+ parser.add_argument('--isMoreINRInput', action="store_false",
86
+ help='Whether to cat RGB and mask. See Section 3.4 in the paper.')
87
+
88
+ parser.add_argument('--hr_train', action="store_true",
89
+ help='Whether use hr_train. See section 3.4 in the paper.')
90
+
91
+ parser.add_argument('--isFullRes', action="store_true",
92
+ help='Whether for original resolution. See section 3.4 in the paper.')
93
+
94
+ opt = parser.parse_args()
95
+
96
+ opt.save_path = misc.increment_path(os.path.join(opt.save_path, "exp1"))
97
+
98
+ try:
99
+ import wandb
100
+ opt.wandb = True
101
+ wandb.init(config=opt, project="INR_Harmonization", name=os.path.basename(opt.save_path))
102
+
103
+ except:
104
+ opt.wandb = False
105
+
106
+ return opt
107
+
108
+
109
+ def main_process(opt):
110
+ logger = misc.create_logger(os.path.join(opt.save_path, "log.txt"))
111
+ cudnn.benchmark = True
112
+
113
+ trainset_path = os.path.join(opt.dataset_path, "IHD_train.txt")
114
+ valset_path = os.path.join(opt.dataset_path, "IHD_test.txt")
115
+
116
+ opt.transform_mean = [.5, .5, .5]
117
+ opt.transform_var = [.5, .5, .5]
118
+ torch_transform = transforms.Compose([transforms.ToTensor(),
119
+ transforms.Normalize(opt.transform_mean, opt.transform_var)])
120
+
121
+ trainset_alb_transform = albumentations.Compose(
122
+ [
123
+ RandomResizedCrop(opt.input_size, opt.input_size, scale=(0.5, 1.0)),
124
+ HorizontalFlip()],
125
+ additional_targets={'real_image': 'image', 'object_mask': 'image'}
126
+ )
127
+
128
+ valset_alb_transform = albumentations.Compose([Resize(opt.input_size, opt.input_size)],
129
+ additional_targets={'real_image': 'image', 'object_mask': 'image'})
130
+
131
+ trainset = dataset_generator(trainset_path, trainset_alb_transform, torch_transform, opt, mode='Train')
132
+
133
+ valset = dataset_generator(valset_path, valset_alb_transform, torch_transform, opt, mode='Val')
134
+
135
+ train_loader = DataLoader(trainset, opt.batch_size, shuffle=True, drop_last=True,
136
+ pin_memory=True,
137
+ num_workers=opt.workers, persistent_workers=True)
138
+
139
+ val_loader = DataLoader(valset, opt.batch_size, shuffle=False, drop_last=False, pin_memory=True,
140
+ num_workers=opt.workers, persistent_workers=True)
141
+
142
+ model = build_model(opt).to(opt.device)
143
+
144
+ loss_fn = build_loss.loss_generator()
145
+
146
+ optimizer_params = {
147
+ 'lr': opt.lr,
148
+ 'weight_decay': 1e-2
149
+ }
150
+ optimizer = misc.get_optimizer(model, opt.optim, optimizer_params)
151
+
152
+ scheduler = lr_scheduler.OneCycleLR(optimizer, max_lr=opt.lr, total_steps=opt.epochs * len(train_loader),
153
+ pct_start=0.0)
154
+
155
+ processing.train(train_loader, val_loader, model, optimizer, scheduler, loss_fn, logger, opt)
156
+
157
+
158
+ if __name__ == '__main__':
159
+ opt = parse_args()
160
+ os.makedirs(opt.save_path, exist_ok=True)
161
+ main_process(opt)