Spaces:

WindVChen
/

INR-Harmon

Running

App Files Files Community

WindVChen commited on Jul 20, 2023

Commit

6710c89

•

1 Parent(s): d2633d8

Init

Browse files

Files changed (10) hide show

.gitignore +4 -0
LICENSE +201 -0
README.md +295 -13
app.py +279 -0
efficient_inference_for_square_image.py +345 -0
inference.py +236 -0
inference_for_arbitrary_resolution_image.py +337 -0
processing.py +308 -0
requirements.txt +11 -0
train.py +161 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,4 @@

+.idea/*
+logs/*
+wandb/*
+pretrained_models/*

LICENSE ADDED Viewed

	@@ -0,0 +1,201 @@

+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright [yyyy] [name of copyright owner]
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.

README.md CHANGED Viewed

@@ -1,13 +1,295 @@
----
-title: INR Harmon
-emoji: 🌖
-colorFrom: yellow
-colorTo: green
-sdk: gradio
-sdk_version: 3.37.0
-app_file: app.py
-pinned: false
-license: apache-2.0
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+<div align="center">
+<h1><a href="https://arxiv.org/abs/2303.01681">Dense Pixel-to-Pixel Harmonization via <br /> Continuous Image Representation</a></h1>
+**[Jianqi Chen](https://windvchen.github.io/), [Yilan Zhang](https://scholar.google.com.hk/citations?hl=en&user=wZ4M4ecAAAAJ), [Zhengxia Zou](https://scholar.google.com.hk/citations?hl=en&user=DzwoyZsAAAAJ), [Keyan Chen](https://scholar.google.com.hk/citations?hl=en&user=5RF4ia8AAAAJ),
+and [Zhenwei Shi](https://scholar.google.com.hk/citations?hl=en&user=kNhFWQIAAAAJ)**
+![](https://komarev.com/ghpvc/?username=windvchenINR-Harmonization&label=visitors)
+![GitHub stars](https://badgen.net/github/stars/windvchen/INR-Harmonization)
+[![](https://img.shields.io/badge/license-Apache--2.0-blue)](#License)
+[![](https://img.shields.io/badge/arXiv-2303.01681-b31b1b.svg)](https://arxiv.org/abs/2303.01681)
+<a href="https://huggingface.co/spaces/WindVChen/INR-Harmon"><img alt="Huggingface" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-INR Harmonization-orange"></a>
+</div>
+<p align = "center">
+<img  src="assets/title_harmon.gif"/ width="200">
+<img  src="assets/title_any_image.gif"/ width="200">
+<img  src="assets/title_you_want.gif"/ width="200">
+</p>
+<div align="center">
+<img src="assets/demo.gif" width="600">
+</div>
+### Share us a :star: if this repo does help
+This repository is the official implementation of ***HINet (or INR-Harmonization)***, which can achieve ***Arbitrary aspect ratio & Arbitrary resolution*** image harmonization. If you encounter any question, please feel free to contact
+us. You can create an issue or just send email to me windvchen@gmail.com. Also welcome for any idea exchange and
+discussion.
+## Updates
+[**07/21/2023**] We achieve that!🎉🎉 With all **TODOs** complete! Try here for our [Huggingface Demo]()!! You can also download this repository, and run the GUI locally (refer to [cmd] here)!🥳🥳
+[**07/19/2023**] Hi everyone! We have added two new inference
+scripts: [efficient_inference_for_square_image.py](efficient_inference_for_square_image.py) where you can achieve quite
+fast speed on harmonizing a ***square image***!
+And [inference_for_arbitrary_resolution_image.py](inference_for_arbitrary_resolution_image.py) where you can harmonize
+any resolution image ***(2K, 4k, 8k, JUST WHATEVER YOU WANT!!)***. Please check them out!😉😉
+A summary of features of different inference strategies (More information please refer to [Inference](#inference)):
+|        Features         | [efficient_inference_for_square_image.py](efficient_inference_for_square_image.py) |         [inference_for_arbitrary_resolution_image.py](inference_for_arbitrary_resolution_image.py)          |
+|:-----------------------:|:----------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------:|
+| Support Arbitrary Image |                               ❌ *(Only squre image)*                               |                            ✅ *(Arbitrary aspect ratio, Arbitrary resolution!!!)*                            |
+|          Speed          |                                 🚀 *(Quite fast)*                                  |                                 🚌 *(Relatively slower than the left one)*                                  |
+|       Memory cost       |                                  🌲 *(Quite low)*                                  |                     🏭 *(Relatively higher than the left one for the same resolution)*                      |
+[**07/18/2023**] Check out our new work [***Diff-Harmonization***](https://github.com/WindVChen/Diff-Harmonization),
+which is a **Zero-Shot Harmonization** method based on *Diffusion Models*!😊
+[**07/17/2023**] Pretrained weights have been released. Feel free to try that!👋👋
+[**07/16/2023**] The code is initially public. 🥳
+[**03/06/2023**] Source code and pretrained models will be publicly accessible.
+## TODO
+- [x] Initial code release.
+- [x] Add pretrained model weights.
+- [x] Add the efficient splitting strategy for inferencing on original resolution images.
+- [x] Add Gradio demo.
+## Table of Contents
+- [Abstract](#abstract)
+- [Requirements](#requirements)
+- [Training](#training)
+    - [Train in low resolution (LR) mode](#train-in-low-resolution--lr--mode)
+    - [Train in high resolution (HR) mode](#train-in-high-resolution--hr--mode--eg-2048x2048-)
+    - [Train in original resolution mode](#train-in-original-resolution-mode)
+- [Evaluation](#evaluation)
+    - [Evaluation in low resolution (LR) mode](#evaluation-in-low-resolution--lr--mode)
+    - [Evaluation in high resolution (HR) mode](#evaluation-in-high-resolution--hr--mode--eg-2048x2048-)
+    - [Evaluation in original resolution mode](#evaluation-in-original-resolution-mode)
+- [Inference](#inference)
+    - [Inference on square images (fast & low cost)](#inference-on-square-images--fast--low-cost-)
+    - [Inference on arbitrary resolution images (Support any resolution)](#Inference-on-arbitrary-resolution-images--slow-high-cost-but-support-any-resolution-)
+- [Results](#results)
+- [Citation & Acknowledgments](#citation--acknowledgments)
+- [License](#license)
+## Abstract
+![HINet's framework](assets/network.png)
+High-resolution (HR) image harmonization is of great significance in real-world applications such as image synthesis and
+image editing. However, due to the high memory costs, existing dense pixel-to-pixel harmonization methods are mainly
+focusing on processing low-resolution (LR) images. Some recent works resort to combining with color-to-color
+transformations but are either limited to certain resolutions or heavily depend on hand-crafted image filters. In this
+work, we explore leveraging the implicit neural representation (INR) and propose a novel
+***image Harmonization method based on Implicit neural Networks (HINet)***, which to the best of our knowledge, is
+***the first dense pixel-to-pixel method applicable to HR images without any hand-crafted filter design***. Inspired by
+the Retinex theory, we decouple the MLPs into two parts to respectively capture the content and environment of composite
+images. A Low-Resolution Image Prior (LRIP) network is designed to alleviate the Boundary Inconsistency problem, and we
+also propose new designs for the training and inference process. Extensive experiments have demonstrated the
+effectiveness of our method compared with state-of-the-art methods. Furthermore, some interesting and practical
+applications of the proposed method are explored.
+## Requirements
+1. Software Requirements
+    - Python: 3.8
+    - CUDA: 11.3
+    - cuDNN: 8.4.1
+   To install other requirements:
+   ```
+   pip install -r requirements.txt
+   ```
+2. Datasets
+    - We train and evaluate on the [iHarmony4 dataset](https://github.com/bcmi/Image-Harmonization-Dataset-iHarmony4).
+      Please download the dataset in advance, and arrange them into the following structure:
+   ```
+   ├── dataset_path
+      ├── HAdobe5k
+         ├── composite_images
+         ├── masks
+         ├── real_images
+      ├── HCOCO
+      ├── Hday2night
+      ├── HFlickr
+      IHD_test.txt
+      IHD_train.txt
+   ```
+    - Before training we resize HAdobe5k subdataset so that each side is smaller than 1024. This is for quick data
+      loading. The resizing script can refer to [resize_Adobe.py](tools/resize_Adobe.py).
+    - For training or evaluating on the original resolution of iHarmony4 dataset. Please newly create a `HAdobe5kori`
+      directory with the original HAdobe5k images in it.
+    - If you want to train and evaluate only on HAdobe5k subdataset (see Table 1 in the paper), you can modify
+      the `IHD_train.txt` and `IHD_test.txt` in [train.py](train.py) to only contain the HAdobe5k images.
+3. Pre-trained Models
+    - We adopt [HRNetV2](https://github.com/HRNet/HRNet-Image-Classification) as our encoder, you can download the
+      weight
+      from [here](https://onedrive.live.com/?authkey=%21AMkPimlmClRvmpw&id=F7FD0B7F26543CEB%21112&cid=F7FD0B7F26543CEB&parId=root&parQt=sharedby&parCid=C8304F01C1A85932&o=OneUp)
+      and save the weight in `pretrained_models` directory.
+    - In the following table, we provide several model weights pretrained under different resolutions (Correspond to
+      Table 1 in the paper):
+|                      Download Link                       |                         Model Descriptions                          |
+|:--------------------------------------------------------:|:-------------------------------------------------------------------:|
+| [Resolution_RAW_iHarmony4.pth][Resolution_RAW_iHarmony4] |  Train by RSC strategy with original resolution iHarmony4 dataset   |
+| [Resolution_256_iHarmony4.pth][Resolution_256_iHarmony4] |           Train with 256*256 resolution iHarmony4 dataset           |
+|  [Resolution_RAW_HAdobe5K.pth][Resolution_RAW_HAdobe5K]  | Train by RSC strategy with original resolution HAdobe5k subdataset  |
+| [Resolution_2048_HAdobe5K.pth][Resolution_2048_HAdobe5K] | Train by RSC strategy with 2048*2048 resolution HAdobe5k subdataset |
+| [Resolution_1024_HAdobe5K.pth][Resolution_1024_HAdobe5K] | Train by RSC strategy with 1024*1024 resolution HAdobe5k subdataset |
+[Resolution_RAW_iHarmony4]: https://drive.google.com/file/d/1O9faWNk54mIzMaGZ1tmgm0EJpH20a-Fl/view?usp=drive_link
+[Resolution_256_iHarmony4]: https://drive.google.com/file/d/1xym96LTP9a75UseDWGW2KRN1gyl3HPyM/view?usp=sharing
+[Resolution_RAW_HAdobe5K]: https://drive.google.com/file/d/1JeUS5inuOM0pASKfu-tK9K7E5pGkP570/view?usp=drive_link
+[Resolution_2048_HAdobe5K]: https://drive.google.com/file/d/18RxTfZsPEoi6kSS_UVEsUBYRBHAl4MfB/view?usp=drive_link
+[Resolution_1024_HAdobe5K]: https://drive.google.com/file/d/1cOY74mN8gIz66watyoobZ1knrigkQyb5/view?usp=sharing
+## Visualization GUI
+We provide a GUI based on Gradio for visualizing the intermediate results of our method. You can run the following command to start it locally, or make use of our provided [Huggingface Space]().
+```bash
+python app.py
+```
+## Training
+The intermediate output (including checkpoint, visualization, log.txt) will be saved in directory `logs/exp`.
+### Train in low resolution (LR) mode
+```bash
+python train.py --dataset_path {dataset_path} --base_size 256 --input_size 256 --INR_input_size 256
+```
+- `dataset_path`: the path of the iHarmony4 dataset.
+- `base_size`: the size of the input image to encoder.
+- `input_size`: the size of the target resolution.
+- `INR_input_size`: the size of the input image to the INR decoder.
+- `hr_train`: whether to train in high resolution (HR) mode, i.e., using RSC strategy (See Section 3.4 in the paper).
+- `isFullRes`: whether to train in full/original resolution mode.
+- (More parameters' information could be found in codes ...)
+### Train in high resolution (HR) mode (E.g, 2048x2048)
+If **not use RSC strategy**, the training command is as follows: (For a single RTX 3090, it will lead to out-of-memory
+even `batch_size` is set to 2.)
+```bash
+python train.py --dataset_path {dataset_path} --base_size 256 --input_size 2048 --INR_input_size 2048
+```
+If **use RSC strategy**, the training command is as follows: (For a single RTX 3090, `batch_size` can set up to 6.)
+```bash
+python train.py --dataset_path {dataset_path} --base_size 256 --input_size 2048 --INR_input_size 2048 --hr_train
+```
+### Train in original resolution mode
+```bash
+python train.py --dataset_path {dataset_path} --base_size 256 --hr_train --isFullRes
+```
+## Evaluation
+The intermediate output (including visualizations, log.txt) will be saved in directory `logs/test`.
+**Notice:** Due to the resolution-agnostic characteristic of INR, you can evaluate dataset at any resolution not matter
+which resolution the model is trained on. Please refer to Table 4 and Table 5 in the paper.
+### Evaluation in low resolution (LR) mode
+```bash
+python inference.py --dataset_path {dataset_path} --pretrained {pretrained_weight} --base_size 256 --input_size 256 --INR_input_size 256
+```
+### Evaluation in high resolution (HR) mode (E.g, 2048x2048)
+```bash
+python inference.py --dataset_path {dataset_path} --pretrained {pretrained_weight} --base_size 256 --input_size 2048 --INR_input_size 2048
+```
+### Evaluation in original resolution mode
+```bash
+python inference.py --dataset_path {dataset_path} --pretrained {pretrained_weight} --base_size 256 --hr_train --isFullRes
+```
+## Inference
+We have provided demo images (2K and 6K) in [demo](demo). Feel free to play around them.
+**Notice:** Due to the resolution-agnostic characteristic of INR, you can inference images at any resolution not matter
+which resolution the model is trained on. Please refer to Table 4 and Table 5 in the paper.
+### Inference on square images (fast & low cost)
+If you want to inference on square images, please use the command here. Note that this code only support square images with resolution of multiplies of 256. Some other requirements will be listed in cmd prints (if error) when you run the code.
+```bash
+python efficient_inference.py --split_resolution {split_resolution} --composite_image {composite_image_path} --mask {mask_path} --save_path --{save_path} --pretrained {pretrained_weight}
+```
+- `split_resolution`: the resolution of the split patches. (E.g., 512 means the input image will be split into 512x512 patches.) These patches will finally be assembled back to the resolution of the original image.
+- `composite_image`: the path of the composite image. You can try with the provided images in [demo](demo).
+- `mask`: the path of the mask. You can try with the provided masks in [demo](demo).
+- `save_path`: the path of the output image.
+- `pretrained`: the path of the pretrained weight.
+### Inference on arbitrary resolution images (slow, high cost, but support any resolution)
+If the former inference script cannot meet your needs and you want to inference on arbitrary resolution images, please use the command here. Note that this script will be slower and cost more memory for a same resolution (***But anyway, it supports arbitrary resolution***).
+If you encounter out-of-memory error, please try to reduce the `split_num` parameter below. (Our script will also have some prints that can guide you to do this)
+```bash
+python inference_for_arbitrary_resolution.py --split_num {split_num} --composite_image {composite_image_path} --mask {mask_path} --save_path --{save_path} --pretrained {pretrained_weight}
+```
+- `split_num`: the number of splits for the input image. (E.g., 4 means the input image will be split into 4x4=16 patches.)
+- `composite_image`: the path of the composite image. You can try with the provided images in [demo](demo).
+- `mask`: the path of the mask. You can try with the provided masks in [demo](demo).
+- `save_path`: the path of the output image.
+- `pretrained`: the path of the pretrained weight.
+## Results
+![Metrics](assets/metrics.png#pic_center)
+![Visual comparisons](assets/visualizations.png#pic_center)
+![Visual comparisons2](assets/visualizations2.png#pic_center)
+## Citation & Acknowledgments
+If you find this paper useful in your research, please consider citing:
+```
+@article{chen2023dense,
+  title={Dense Pixel-to-Pixel Harmonization via Continuous Image Representation},
+  author={Chen, Jianqi and Zhang, Yilan and Zou, Zhengxia and Chen, Keyan and Shi, Zhenwei},
+  journal={arXiv preprint arXiv:2303.01681},
+  year={2023}
+}
+```
+## License
+This project is licensed under the Apache-2.0 license. See [LICENSE](LICENSE) for details.

app.py ADDED Viewed

	@@ -0,0 +1,279 @@

+import os
+import cv2
+import gradio as gr
+import numpy as np
+import sys
+import io
+import torch
+class Logger:
+    def __init__(self):
+        self.terminal = sys.stdout
+        self.log = io.BytesIO()
+    def write(self, message):
+        self.terminal.write(message)
+        self.log.write(bytes(message, encoding='utf-8'))
+    def flush(self):
+        self.terminal.flush()
+        self.log.flush()
+    def isatty(self):
+        return False
+log = Logger()
+sys.stdout = log
+def read_logs():
+    out = log.log.getvalue().decode()
+    if out.count("\n") >= 30:
+        log.log = io.BytesIO()
+    sys.stdout.flush()
+    return out
+with gr.Blocks() as app:
+    gr.Markdown("""
+# HINet (or INR-Harmonization) - A novel image Harmonization method based on Implicit neural Networks
+## Harmonize any image you want! Arbitrary resolution, and arbitrary aspect ratio!
+### Official Gradio Demo
+**Since Gradio Space only support CPU, the speed may kind of slow. You may better download the code to run locally with a GPU.**
+<a href="https://huggingface.co/spaces/WindVChen/INR-Harmon?duplicate=true" style="display: inline-block;margin-top: .5em;margin-right: .25em;" target="_blank">
+<img style="margin-bottom: 0em;display: inline;margin-top: -.25em;" src="https://bit.ly/3gLdBN6" alt="Duplicate Space"></a> for no queue on your own hardware.</p>
+* Official Repo: [INR-Harmonization](https://github.com/WindVChen/INR-Harmonization)
+""")
+    valid_checkpoints_dict = {"Resolution_256_iHarmony4": "Resolution_256_iHarmony4.pth",
+                              "Resolution_1024_HAdobe5K": "Resolution_1024_HAdobe5K.pth",
+                              "Resolution_2048_HAdobe5K": "Resolution_2048_HAdobe5K.pth",
+                              "Resolution_RAW_HAdobe5K": "Resolution_RAW_HAdobe5K.pth",
+                              "Resolution_RAW_iHarmony4": "Resolution_RAW_iHarmony4.pth"}
+    global_state = gr.State({
+        'pretrained_weight': valid_checkpoints_dict["Resolution_RAW_iHarmony4"],
+    })
+    with gr.Row():
+        form_composite_image = gr.Image(label='Input Composite image', type='pil').style(height="auto")
+        form_mask_image = gr.Image(label='Input Mask image', type='pil', interactive=False).style(
+            height="auto")
+    with gr.Row():
+        with gr.Column(scale=4):
+            with gr.Row():
+                with gr.Column(scale=2, min_width=10):
+                    gr.Markdown(value='Model Selection', show_label=False)
+                with gr.Column(scale=4, min_width=10):
+                    form_pretrained_dropdown = gr.Dropdown(
+                        choices=list(valid_checkpoints_dict.values()),
+                        label="Pretrained Model",
+                        value=valid_checkpoints_dict["Resolution_RAW_iHarmony4"],
+                        interactive=True
+                    )
+            with gr.Row():
+                with gr.Column(scale=2, min_width=10):
+                    gr.Markdown(value='Inference Mode', show_label=False)
+                with gr.Column(scale=4, min_width=10):
+                    form_inference_mode = gr.Radio(
+                        ['Square Image', 'Arbitrary Image'],
+                        value='Arbitrary Image',
+                        interactive=False,
+                        label='Mode',
+                    )
+            with gr.Row():
+                with gr.Column(scale=2, min_width=10):
+                    gr.Markdown(value='Split Parameter', show_label=False)
+                with gr.Column(scale=4, min_width=10):
+                    form_split_res = gr.Slider(
+                        minimum=0,
+                        maximum=2048,
+                        step=128,
+                        value=256,
+                        interactive=False,
+                        label="Split Resolution",
+                    )
+                    form_split_num = gr.Number(
+                        value=8,
+                        interactive=False,
+                        label="Split Number")
+            with gr.Row():
+                form_log = gr.Textbox(read_logs, label="Logs", interactive=False, type="text", every=1)
+        with gr.Column(scale=4):
+            form_harmonized_image = gr.Image(label='Harmonized Result', type='numpy', interactive=False).style(
+                height="auto")
+            form_start_btn = gr.Button("Start Harmonization", interactive=False)
+            form_reset_btn = gr.Button("Reset", interactive=True)
+    def on_change_form_composite_image(form_composite_image):
+        if form_composite_image is None:
+            return gr.update(interactive=False, value=None), gr.update(value=None)
+        return gr.update(interactive=True), gr.update(value=None)
+    def on_change_form_mask_image(form_composite_image, form_mask_image):
+        if form_mask_image is None:
+            return gr.update(interactive=False if form_composite_image is None else True), gr.update(
+                interactive=False), gr.update(interactive=False), gr.update(
+                interactive=False), gr.update(interactive=False), gr.update(value=None)
+        if form_composite_image.size[:2] != form_mask_image.size[:2]:
+            raise gr.Error("Composite image and mask image should have the same resolution!")
+        else:
+            w, h = form_composite_image.size[:2]
+            if h != w or (h % 16 != 0):
+                return gr.update(value='Arbitrary Image', interactive=False), gr.update(interactive=True), gr.update(
+                    interactive=True), gr.update(interactive=True), gr.update(interactive=False,
+                                                                              value=-1), gr.update(value=None)
+            else:
+                return gr.update(value='Square Image', interactive=True), gr.update(interactive=True), gr.update(
+                    interactive=True), gr.update(interactive=False), gr.update(interactive=True,
+                                                                               value=h // 16,
+                                                                               maximum=h,
+                                                                               minimum=h // 16,
+                                                                               step=h // 16), gr.update(value=None)
+    form_composite_image.change(
+        on_change_form_composite_image,
+        inputs=[form_composite_image],
+        outputs=[form_mask_image, form_harmonized_image]
+    )
+    form_mask_image.change(
+        on_change_form_mask_image,
+        inputs=[form_composite_image, form_mask_image],
+        outputs=[form_inference_mode, form_mask_image, form_start_btn, form_split_num, form_split_res,
+                 form_harmonized_image]
+    )
+    def on_change_form_split_num(form_composite_image, form_split_num):
+        w, h = form_composite_image.size[:2]
+        if form_split_num < 1:
+            return gr.update(value=1)
+        elif form_split_num > min(w, h):
+            return gr.update(value=min(w, h))
+        else:
+            return gr.update(value=form_split_num)
+    form_split_num.change(
+        on_change_form_split_num,
+        inputs=[form_composite_image, form_split_num],
+        outputs=[form_split_num]
+    )
+    def on_change_form_inference_mode(form_inference_mode):
+        if form_inference_mode == "Square Image":
+            return gr.update(interactive=True), gr.update(interactive=False)
+        else:
+            return gr.update(interactive=False), gr.update(interactive=True)
+    form_inference_mode.change(on_change_form_inference_mode, inputs=[form_inference_mode],
+                               outputs=[form_split_res, form_split_num])
+    def on_click_form_start_btn(form_composite_image, form_mask_image, form_pretrained_dropdown, form_inference_mode,
+                                form_split_res, form_split_num):
+        log.log = io.BytesIO()
+        if form_inference_mode == "Square Image":
+            from efficient_inference_for_square_image import parse_args, main_process
+            opt = parse_args()
+            opt.transform_mean = [.5, .5, .5]
+            opt.transform_var = [.5, .5, .5]
+            opt.pretrained = os.path.join("./pretrained_models", form_pretrained_dropdown)
+            opt.split_resolution = form_split_res
+            opt.save_path = None
+            opt.workers = 0
+            opt.device = "cuda" if torch.cuda.is_available() else "cpu"
+            composite_image = np.asarray(form_composite_image)
+            mask = np.asarray(form_mask_image)
+            try:
+                return cv2.cvtColor(
+                    main_process(opt, composite_image=composite_image, mask=mask),
+                    cv2.COLOR_BGR2RGB)
+            except:
+                raise gr.Error("Patches too big. Try to reduce the `split_res`!")
+        else:
+            from inference_for_arbitrary_resolution_image import parse_args, main_process
+            opt = parse_args()
+            opt.transform_mean = [.5, .5, .5]
+            opt.transform_var = [.5, .5, .5]
+            opt.pretrained = os.path.join("./pretrained_models", form_pretrained_dropdown)
+            opt.split_num = int(form_split_num)
+            opt.save_path = None
+            opt.workers = 0
+            opt.device = "cuda" if torch.cuda.is_available() else "cpu"
+            composite_image = np.asarray(form_composite_image)
+            mask = np.asarray(form_mask_image)
+            try:
+                return cv2.cvtColor(
+                    main_process(opt, composite_image=composite_image, mask=mask),
+                    cv2.COLOR_BGR2RGB)
+            except:
+                raise gr.Error("Patches too big. Try to increase the `split_num`!")
+    form_start_btn.click(on_click_form_start_btn,
+                         inputs=[form_composite_image, form_mask_image, form_pretrained_dropdown, form_inference_mode,
+                                 form_split_res, form_split_num], outputs=[form_harmonized_image])
+    def on_click_form_reset_btn():
+        log.log = io.BytesIO()
+        return gr.update(value=None), gr.update(value=None, interactive=True), gr.update(value=None,
+                                                                                         interactive=False), gr.update(
+            interactive=False)
+    form_reset_btn.click(on_click_form_reset_btn,
+                         inputs=None, outputs=[form_log, form_composite_image, form_mask_image, form_start_btn])
+    gr.Markdown("""
+        ## Quick Start
+        1. Select desired `Pretrained Model`.
+        2. Select a composite image, and then a mask with the same size.
+        3. Select the inference mode (for non-square image, only `Arbitrary Image` support).
+        4. Set `Split Resolution` (Patches' resolution) or `Split Number` (How many patches, about N*N) according to the inference mode.
+        3. Click `Start` and enjoy it!
+        """)
+    gr.HTML("""
+        <style>
+            .container {
+                position: absolute;
+                height: 50px;
+                text-align: center;
+                line-height: 50px;
+                width: 100%;
+            }
+        </style>
+        <div class="container">
+        Gradio demo supported by
+        <a href="https://github.com/WindVChen">WindVChen</a>
+        </div>
+        """)
+gr.close_all()
+app.queue(concurrency_count=1, max_size=200, api_open=False)
+app.launch(show_api=False, server_port=12345)

efficient_inference_for_square_image.py ADDED Viewed

	@@ -0,0 +1,345 @@

+import argparse
+import torch.backends.cudnn as cudnn
+import torchvision.transforms as transforms
+from torch.utils.data import DataLoader
+from model.build_model import build_model
+import torch
+import cv2
+import numpy as np
+import torchvision
+import os
+import tqdm
+import time
+from utils.misc import prepare_cooridinate_input, customRandomCrop
+from datasets.build_INR_dataset import Implicit2DGenerator
+import albumentations
+from albumentations import Resize
+from torch.utils.data import DataLoader
+from utils.misc import normalize
+import math
+class single_image_dataset(torch.utils.data.Dataset):
+    def __init__(self, opt, composite_image=None, mask=None):
+        super().__init__()
+        self.opt = opt
+        if composite_image is None:
+            composite_image = cv2.imread(opt.composite_image)
+            composite_image = cv2.cvtColor(composite_image, cv2.COLOR_BGR2RGB)
+        self.composite_image = composite_image
+        assert composite_image.shape[0] == composite_image.shape[1], "This faster script only supports square images."
+        assert composite_image.shape[
+                   0] % 256 == 0, "This faster script only supports images with resolution multiples of 256."
+        assert opt.split_resolution % (composite_image.shape[
+                                           0] // 16) == 0, f"The image resolution is {composite_image.shape[0]}, " \
+                                                           f"you should set {opt.split_resolution} to multiplies of {composite_image.shape[0] // 16}"
+        if mask is None:
+            mask = cv2.imread(opt.mask)
+        mask = mask[:, :, 0].astype(np.float32) / 255.
+        self.mask = mask
+        self.torch_transforms = transforms.Compose([transforms.ToTensor(),
+                                                    transforms.Normalize([.5, .5, .5], [.5, .5, .5])])
+        self.INR_dataset = Implicit2DGenerator(opt, 'Val')
+        self.split_width_resolution = self.split_height_resolution = opt.split_resolution
+        self.num_w = math.ceil(composite_image.shape[1] / self.split_width_resolution)
+        self.num_h = math.ceil(composite_image.shape[0] / self.split_height_resolution)
+        self.split_start_point = []
+        "Split the image into several parts."
+        for i in range(self.num_h):
+            for j in range(self.num_w):
+                if i == composite_image.shape[0] // self.split_height_resolution:
+                    if j == composite_image.shape[1] // self.split_width_resolution:
+                        self.split_start_point.append((composite_image.shape[0] - self.split_height_resolution,
+                                                       composite_image.shape[1] - self.split_width_resolution))
+                    else:
+                        self.split_start_point.append(
+                            (composite_image.shape[0] - self.split_height_resolution, j * self.split_width_resolution))
+                else:
+                    if j == composite_image.shape[1] // self.split_width_resolution:
+                        self.split_start_point.append(
+                            (i * self.split_height_resolution, composite_image.shape[1] - self.split_width_resolution))
+                    else:
+                        self.split_start_point.append(
+                            (i * self.split_height_resolution, j * self.split_width_resolution))
+        assert len(self.split_start_point) == self.num_w * self.num_h
+        print(
+            f"The image will be split into {self.num_h} pieces in height, and {self.num_w} pieces in width. Totally {self.num_h * self.num_w} patches.")
+        print(f"The final resolution of each patch is {self.split_height_resolution} x {self.split_width_resolution}")
+    def __len__(self):
+        return self.num_w * self.num_h
+    def __getitem__(self, idx):
+        composite_image = self.composite_image
+        mask = self.mask
+        full_coord = prepare_cooridinate_input(mask).transpose(1, 2, 0)
+        tmp_transform = albumentations.Compose([Resize(self.opt.base_size, self.opt.base_size)],
+                                               additional_targets={'object_mask': 'image'})
+        transform_out = tmp_transform(image=self.composite_image, object_mask=self.mask)
+        compos_list = [self.torch_transforms(transform_out['image'])]
+        mask_list = [
+            torchvision.transforms.ToTensor()(transform_out['object_mask'][..., np.newaxis].astype(np.float32))]
+        coord_map_list = []
+        if composite_image.shape[0] != self.split_height_resolution:
+            c_h = self.split_start_point[idx][0] / (composite_image.shape[0] - self.split_height_resolution)
+        else:
+            c_h = 0
+        if composite_image.shape[1] != self.split_width_resolution:
+            c_w = self.split_start_point[idx][1] / (composite_image.shape[1] - self.split_width_resolution)
+        else:
+            c_w = 0
+        transform_out, c_h, c_w = customRandomCrop([composite_image, mask, full_coord],
+                                                   self.split_height_resolution, self.split_width_resolution, c_h, c_w)
+        compos_list.append(self.torch_transforms(transform_out[0]))
+        mask_list.append(
+            torchvision.transforms.ToTensor()(transform_out[1][..., np.newaxis].astype(np.float32)))
+        coord_map_list.append(torchvision.transforms.ToTensor()(transform_out[2]))
+        coord_map_list.append(torchvision.transforms.ToTensor()(transform_out[2]))
+        for n in range(2):
+            tmp_comp = cv2.resize(composite_image, (
+                composite_image.shape[1] // 2 ** (n + 1), composite_image.shape[0] // 2 ** (n + 1)))
+            tmp_mask = cv2.resize(mask, (mask.shape[1] // 2 ** (n + 1), mask.shape[0] // 2 ** (n + 1)))
+            tmp_coord = prepare_cooridinate_input(tmp_mask).transpose(1, 2, 0)
+            transform_out, c_h, c_w = customRandomCrop([tmp_comp, tmp_mask, tmp_coord],
+                                                       self.split_height_resolution // 2 ** (n + 1),
+                                                       self.split_width_resolution // 2 ** (n + 1), c_h, c_w)
+            compos_list.append(self.torch_transforms(transform_out[0]))
+            mask_list.append(
+                torchvision.transforms.ToTensor()(transform_out[1][..., np.newaxis].astype(np.float32)))
+            coord_map_list.append(torchvision.transforms.ToTensor()(transform_out[2]))
+        out_comp = compos_list
+        out_mask = mask_list
+        out_coord = coord_map_list
+        fg_INR_coordinates, bg_INR_coordinates, fg_INR_RGB, fg_transfer_INR_RGB, bg_INR_RGB = self.INR_dataset.generator(
+            self.torch_transforms, transform_out[0], transform_out[0], mask)
+        return {
+            'composite_image': out_comp,
+            'mask': out_mask,
+            'coordinate_map': out_coord,
+            'composite_image0': out_comp[0],
+            'mask0': out_mask[0],
+            'coordinate_map0': out_coord[0],
+            'composite_image1': out_comp[1],
+            'mask1': out_mask[1],
+            'coordinate_map1': out_coord[1],
+            'composite_image2': out_comp[2],
+            'mask2': out_mask[2],
+            'coordinate_map2': out_coord[2],
+            'composite_image3': out_comp[3],
+            'mask3': out_mask[3],
+            'coordinate_map3': out_coord[3],
+            'fg_INR_coordinates': fg_INR_coordinates,
+            'bg_INR_coordinates': bg_INR_coordinates,
+            'fg_INR_RGB': fg_INR_RGB,
+            'fg_transfer_INR_RGB': fg_transfer_INR_RGB,
+            'bg_INR_RGB': bg_INR_RGB,
+            'start_point': self.split_start_point[idx],
+            'start_proportion': [self.split_start_point[idx][0] / (composite_image.shape[0]),
+                                 self.split_start_point[idx][1] / (composite_image.shape[1]),
+                                 (self.split_start_point[idx][0] + self.split_height_resolution) / (
+                                 composite_image.shape[0]),
+                                 (self.split_start_point[idx][1] + self.split_width_resolution) / (
+                                 composite_image.shape[1])],
+        }
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--split_resolution', type=int, default=2048,
+                        help='The resolution of the patch split.')
+    parser.add_argument('--composite_image', type=str, default=r'./demo/demo_2k_composite.jpg',
+                        help='composite image path')
+    parser.add_argument('--mask', type=str, default=r'./demo/demo_2k_mask.jpg',
+                        help='mask path')
+    parser.add_argument('--save_path', type=str, default=r'./demo/',
+                        help='save path')
+    parser.add_argument('--workers', type=int, default=8,
+                        metavar='N', help='Dataloader threads.')
+    parser.add_argument('--batch_size', type=int, default=1,
+                        help='You can override model batch size by specify positive number.')
+    parser.add_argument('--device', type=str, default='cuda',
+                        help="Whether use cuda, 'cuda' or 'cpu'.")
+    parser.add_argument('--base_size', type=int, default=256,
+                        help='Base size. Resolution of the image input into the Encoder')
+    parser.add_argument('--input_size', type=int, default=256,
+                        help='Input size. Resolution of the image that want to be generated by the Decoder')
+    parser.add_argument('--INR_input_size', type=int, default=256,
+                        help='INR input size. Resolution of the image that want to be generated by the Decoder. '
+                             'Should be the same as `input_size`')
+    parser.add_argument('--INR_MLP_dim', type=int, default=32,
+                        help='Number of channels for INR linear layer.')
+    parser.add_argument('--LUT_dim', type=int, default=7,
+                        help='Dim of the output LUT. Refer to https://ieeexplore.ieee.org/abstract/document/9206076')
+    parser.add_argument('--activation', type=str, default='leakyrelu_pe',
+                        help='INR activation layer type: leakyrelu_pe, sine')
+    parser.add_argument('--pretrained', type=str,
+                        default=r'.\pretrained_models\Resolution_RAW_iHarmony4.pth',
+                        help='Pretrained weight path')
+    parser.add_argument('--param_factorize_dim', type=int,
+                        default=10,
+                        help='The intermediate dimensions of the factorization of the predicted MLP parameters. '
+                             'Refer to https://arxiv.org/abs/2011.12026')
+    parser.add_argument('--embedding_type', type=str,
+                        default="CIPS_embed",
+                        help='Which embedding_type to use.')
+    parser.add_argument('--INRDecode', action="store_false",
+                        help='Whether INR decoder. Set it to False if you want to test the baseline '
+                             '(https://github.com/SamsungLabs/image_harmonization)')
+    parser.add_argument('--isMoreINRInput', action="store_false",
+                        help='Whether to cat RGB and mask. See Section 3.4 in the paper.')
+    parser.add_argument('--hr_train', action="store_false",
+                        help='Whether use hr_train. See section 3.4 in the paper.')
+    parser.add_argument('--isFullRes', action="store_true",
+                        help='Whether for original resolution. See section 3.4 in the paper.')
+    opt = parser.parse_args()
+    assert opt.batch_size == 1, 'This faster script only supports batch size 1 for inference.'
+    return opt
+@torch.no_grad()
+def inference(model, opt, composite_image=None, mask=None):
+    model.eval()
+    "dataset here is actually consisted of several patches of a single image."
+    singledataset = single_image_dataset(opt, composite_image, mask)
+    single_data_loader = DataLoader(singledataset, opt.batch_size, shuffle=False, drop_last=False, pin_memory=True,
+                                    num_workers=opt.workers, persistent_workers=False if composite_image is not None else True)
+    "Init a pure black image with the same size as the input image."
+    init_img = np.zeros_like(singledataset.composite_image)
+    time_all = 0
+    for step, batch in tqdm.tqdm(enumerate(single_data_loader)):
+        composite_image = [batch[f'composite_image{name}'].to(opt.device) for name in range(4)]
+        mask = [batch[f'mask{name}'].to(opt.device) for name in range(4)]
+        coordinate_map = [batch[f'coordinate_map{name}'].to(opt.device) for name in range(4)]
+        start_points = batch['start_point']
+        start_proportion = batch['start_proportion']
+        if opt.batch_size == 1:
+            start_points = [torch.cat(start_points)]
+            start_proportion = [torch.cat(start_proportion)]
+        fg_INR_coordinates = coordinate_map[1:]
+        try:
+            if step == 0:  # This is for CUDA Kernel Warm-up, or the first inference step will be quite slow.
+                fg_content_bg_appearance_construct, _, lut_transform_image = model(
+                    composite_image,
+                    mask,
+                    fg_INR_coordinates, start_proportion[0]
+                )
+            if opt.device == "cuda":
+                torch.cuda.reset_max_memory_allocated()
+                torch.cuda.reset_max_memory_cached()
+                start_time = time.time()
+                torch.cuda.synchronize()
+            fg_content_bg_appearance_construct, _, lut_transform_image = model(
+                composite_image,
+                mask,
+                fg_INR_coordinates, start_proportion[0]
+            )
+            if opt.device == "cuda":
+                torch.cuda.synchronize()
+                end_time = time.time()
+                end_max_memory = torch.cuda.max_memory_allocated() // 1024 ** 2
+                end_memory = torch.cuda.memory_allocated() // 1024 ** 2
+                print(f'GPU max memory usage: {end_max_memory} MB')
+                print(f'GPU memory usage: {end_memory} MB')
+                time_all += (end_time - start_time)
+            print(f'progress: {step} / {len(single_data_loader)}')
+        except:
+            raise Exception(
+                f'The image resolution is large. Please reduce the `split_resolution` value. Your current set is {opt.split_resolution}')
+        "Assemble the every patch's harmonized result into the final whole image."
+        for id in range(len(fg_INR_coordinates[0])):
+            pred_fg_image = fg_content_bg_appearance_construct[-1][id]
+            pred_harmonized_image = pred_fg_image * (mask[1][id] > 100 / 255.) + composite_image[1][id] * (
+                ~(mask[1][id] > 100 / 255.))
+            pred_harmonized_tmp = cv2.cvtColor(
+                normalize(pred_harmonized_image.unsqueeze(0), opt, 'inv')[0].permute(1, 2, 0).cpu().mul_(255.).clamp_(
+                    0., 255.).numpy().astype(np.uint8), cv2.COLOR_RGB2BGR)
+            init_img[start_points[id][0]:start_points[id][0] + singledataset.split_height_resolution,
+            start_points[id][1]:start_points[id][1] + singledataset.split_width_resolution] = pred_harmonized_tmp
+    print(f'Inference time: {time_all}')
+    if opt.save_path is not None:
+        os.makedirs(opt.save_path, exist_ok=True)
+        cv2.imwrite(os.path.join(opt.save_path, "pred_harmonized_image.jpg"), init_img)
+    return init_img
+def main_process(opt, composite_image=None, mask=None):
+    cudnn.benchmark = True
+    model = build_model(opt).to(opt.device)
+    load_dict = torch.load(opt.pretrained)['model']
+    for k in load_dict.keys():
+        if k not in model.state_dict().keys():
+            print(f"Skip {k}")
+    model.load_state_dict(load_dict, strict=False)
+    return inference(model, opt, composite_image, mask)
+if __name__ == '__main__':
+    opt = parse_args()
+    opt.transform_mean = [.5, .5, .5]
+    opt.transform_var = [.5, .5, .5]
+    main_process(opt)

inference.py ADDED Viewed

	@@ -0,0 +1,236 @@

+import os
+import argparse
+import albumentations
+from albumentations import Resize
+import torch
+import torch.backends.cudnn as cudnn
+import torchvision.transforms as transforms
+from torch.utils.data import DataLoader
+from model.build_model import build_model
+from datasets.build_dataset import dataset_generator
+from utils import misc, metrics
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--workers', type=int, default=1,
+                        metavar='N', help='Dataloader threads.')
+    parser.add_argument('--batch_size', type=int, default=1,
+                        help='You can override model batch size by specify positive number.')
+    parser.add_argument('--device', type=str, default='cuda',
+                        help="Whether use cuda, 'cuda' or 'cpu'.")
+    parser.add_argument('--save_path', type=str, default="./logs",
+                        help='Where to save logs and checkpoints.')
+    parser.add_argument('--dataset_path', type=str, default=r".\iHarmony4",
+                        help='Dataset path.')
+    parser.add_argument('--base_size', type=int, default=256,
+                        help='Base size. Resolution of the image input into the Encoder')
+    parser.add_argument('--input_size', type=int, default=256,
+                        help='Input size. Resolution of the image that want to be generated by the Decoder')
+    parser.add_argument('--INR_input_size', type=int, default=256,
+                        help='INR input size. Resolution of the image that want to be generated by the Decoder. '
+                             'Should be the same as `input_size`')
+    parser.add_argument('--INR_MLP_dim', type=int, default=32,
+                        help='Number of channels for INR linear layer.')
+    parser.add_argument('--LUT_dim', type=int, default=7,
+                        help='Dim of the output LUT. Refer to https://ieeexplore.ieee.org/abstract/document/9206076')
+    parser.add_argument('--activation', type=str, default='leakyrelu_pe',
+                        help='INR activation layer type: leakyrelu_pe, sine')
+    parser.add_argument('--pretrained', type=str,
+                        default=r'.\pretrained_models\Resolution_RAW_iHarmony4.pth',
+                        help='Pretrained weight path')
+    parser.add_argument('--param_factorize_dim', type=int,
+                        default=10,
+                        help='The intermediate dimensions of the factorization of the predicted MLP parameters. '
+                             'Refer to https://arxiv.org/abs/2011.12026')
+    parser.add_argument('--embedding_type', type=str,
+                        default="CIPS_embed",
+                        help='Which embedding_type to use.')
+    parser.add_argument('--optim', type=str,
+                        default='adamw',
+                        help='Which optimizer to use.')
+    parser.add_argument('--INRDecode', action="store_false",
+                        help='Whether INR decoder. Set it to False if you want to test the baseline '
+                             '(https://github.com/SamsungLabs/image_harmonization)')
+    parser.add_argument('--isMoreINRInput', action="store_false",
+                        help='Whether to cat RGB and mask. See Section 3.4 in the paper.')
+    parser.add_argument('--hr_train', action="store_true",
+                        help='Whether use hr_train. See section 3.4 in the paper.')
+    parser.add_argument('--isFullRes', action="store_true",
+                        help='Whether for original resolution. See section 3.4 in the paper.')
+    opt = parser.parse_args()
+    opt.save_path = misc.increment_path(os.path.join(opt.save_path, "test1"))
+    return opt
+def inference(val_loader, model, logger, opt):
+    current_process = 10
+    model.eval()
+    metric_log = {
+        'HAdobe5k': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
+        'HCOCO': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
+        'Hday2night': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
+        'HFlickr': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
+        'All': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
+    }
+    lut_metric_log = {
+        'HAdobe5k': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
+        'HCOCO': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
+        'Hday2night': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
+        'HFlickr': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
+        'All': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
+    }
+    for step, batch in enumerate(val_loader):
+        composite_image = batch['composite_image'].to(opt.device)
+        real_image = batch['real_image'].to(opt.device)
+        mask = batch['mask'].to(opt.device)
+        category = batch['category']
+        fg_INR_coordinates = batch['fg_INR_coordinates'].to(opt.device)
+        with torch.no_grad():
+            fg_content_bg_appearance_construct, _, lut_transform_image = model(
+                composite_image,
+                mask,
+                fg_INR_coordinates,
+                )
+        if opt.INRDecode:
+            pred_fg_image = fg_content_bg_appearance_construct[-1]
+        else:
+            pred_fg_image = misc.lin2img(fg_content_bg_appearance_construct,
+                                         val_loader.dataset.INR_dataset.size) if fg_content_bg_appearance_construct is not None else None
+        if not opt.INRDecode:
+            pred_harmonized_image = None
+        else:
+            pred_harmonized_image = pred_fg_image * (mask > 100 / 255.) + real_image * (~(mask > 100 / 255.))
+        lut_transform_image = lut_transform_image * (mask > 100 / 255.) + real_image * (~(mask > 100 / 255.))
+        misc.visualize(real_image, composite_image, mask, pred_fg_image,
+                       pred_harmonized_image, lut_transform_image, opt, -1, show=False,
+                       wandb=False, isAll=True, step=step)
+        if opt.INRDecode:
+            mse, fmse, psnr, ssim = metrics.calc_metrics(misc.normalize(pred_harmonized_image, opt, 'inv'),
+                                                         misc.normalize(real_image, opt, 'inv'), mask)
+        lut_mse, lut_fmse, lut_psnr, lut_ssim = metrics.calc_metrics(misc.normalize(lut_transform_image, opt, 'inv'),
+                                                                     misc.normalize(real_image, opt, 'inv'), mask)
+        for idx in range(len(category)):
+            if opt.INRDecode:
+                metric_log[category[idx]]['Samples'] += 1
+                metric_log[category[idx]]['MSE'] += mse[idx]
+                metric_log[category[idx]]['fMSE'] += fmse[idx]
+                metric_log[category[idx]]['PSNR'] += psnr[idx]
+                metric_log[category[idx]]['SSIM'] += ssim[idx]
+                metric_log['All']['Samples'] += 1
+                metric_log['All']['MSE'] += mse[idx]
+                metric_log['All']['fMSE'] += fmse[idx]
+                metric_log['All']['PSNR'] += psnr[idx]
+                metric_log['All']['SSIM'] += ssim[idx]
+            lut_metric_log[category[idx]]['Samples'] += 1
+            lut_metric_log[category[idx]]['MSE'] += lut_mse[idx]
+            lut_metric_log[category[idx]]['fMSE'] += lut_fmse[idx]
+            lut_metric_log[category[idx]]['PSNR'] += lut_psnr[idx]
+            lut_metric_log[category[idx]]['SSIM'] += lut_ssim[idx]
+            lut_metric_log['All']['Samples'] += 1
+            lut_metric_log['All']['MSE'] += lut_mse[idx]
+            lut_metric_log['All']['fMSE'] += lut_fmse[idx]
+            lut_metric_log['All']['PSNR'] += lut_psnr[idx]
+            lut_metric_log['All']['SSIM'] += lut_ssim[idx]
+        if (step + 1) / len(val_loader) * 100 >= current_process:
+            logger.info(f'Processing: {current_process}')
+            current_process += 10
+    logger.info('=========================')
+    for key in metric_log.keys():
+        if opt.INRDecode:
+            msg = f"{key}-'MSE': {metric_log[key]['MSE'] / metric_log[key]['Samples']:.2f}\n" \
+                  f"{key}-'fMSE': {metric_log[key]['fMSE'] / metric_log[key]['Samples']:.2f}\n" \
+                  f"{key}-'PSNR': {metric_log[key]['PSNR'] / metric_log[key]['Samples']:.2f}\n" \
+                  f"{key}-'SSIM': {metric_log[key]['SSIM'] / metric_log[key]['Samples']:.4f}\n" \
+                  f"{key}-'LUT_MSE': {lut_metric_log[key]['MSE'] / lut_metric_log[key]['Samples']:.2f}\n" \
+                  f"{key}-'LUT_fMSE': {lut_metric_log[key]['fMSE'] / lut_metric_log[key]['Samples']:.2f}\n" \
+                  f"{key}-'LUT_PSNR': {lut_metric_log[key]['PSNR'] / lut_metric_log[key]['Samples']:.2f}\n" \
+                  f"{key}-'LUT_SSIM': {lut_metric_log[key]['SSIM'] / lut_metric_log[key]['Samples']:.4f}\n"
+        else:
+            msg = f"{key}-'LUT_MSE': {lut_metric_log[key]['MSE'] / lut_metric_log[key]['Samples']:.2f}\n" \
+                  f"{key}-'LUT_fMSE': {lut_metric_log[key]['fMSE'] / lut_metric_log[key]['Samples']:.2f}\n" \
+                  f"{key}-'LUT_PSNR': {lut_metric_log[key]['PSNR'] / lut_metric_log[key]['Samples']:.2f}\n" \
+                  f"{key}-'LUT_SSIM': {lut_metric_log[key]['SSIM'] / lut_metric_log[key]['Samples']:.4f}\n"
+        logger.info(msg)
+    logger.info('=========================')
+def main_process(opt):
+    logger = misc.create_logger(os.path.join(opt.save_path, "log.txt"))
+    cudnn.benchmark = True
+    valset_path = os.path.join(opt.dataset_path, "IHD_test.txt")
+    opt.transform_mean = [.5, .5, .5]
+    opt.transform_var = [.5, .5, .5]
+    torch_transform = transforms.Compose([transforms.ToTensor(),
+                                          transforms.Normalize(opt.transform_mean, opt.transform_var)])
+    valset_alb_transform = albumentations.Compose([Resize(opt.input_size, opt.input_size)],
+                                                  additional_targets={'real_image': 'image', 'object_mask': 'image'})
+    valset = dataset_generator(valset_path, valset_alb_transform, torch_transform, opt, mode='Val')
+    val_loader = DataLoader(valset, opt.batch_size, shuffle=False, drop_last=False, pin_memory=True,
+                            num_workers=opt.workers, persistent_workers=True)
+    model = build_model(opt).to(opt.device)
+    logger.info(f"Load pretrained weight from {opt.pretrained}")
+    load_dict = torch.load(opt.pretrained)['model']
+    for k in load_dict.keys():
+        if k not in model.state_dict().keys():
+            print(f"Skip {k}")
+    model.load_state_dict(load_dict, strict=False)
+    inference(val_loader, model, logger, opt)
+if __name__ == '__main__':
+    opt = parse_args()
+    os.makedirs(opt.save_path, exist_ok=True)
+    main_process(opt)

inference_for_arbitrary_resolution_image.py ADDED Viewed

	@@ -0,0 +1,337 @@

+import argparse
+import torch.backends.cudnn as cudnn
+import torchvision.transforms as transforms
+from torch.utils.data import DataLoader
+from model.build_model import build_model
+import torch
+import cv2
+import numpy as np
+import torchvision
+import os
+import tqdm
+import time
+from utils.misc import prepare_cooridinate_input, customRandomCrop
+from datasets.build_INR_dataset import Implicit2DGenerator
+import albumentations
+from albumentations import Resize
+from torch.utils.data import DataLoader
+from utils.misc import normalize
+import math
+class single_image_dataset(torch.utils.data.Dataset):
+    def __init__(self, opt, composite_image=None, mask=None):
+        super().__init__()
+        self.opt = opt
+        if composite_image is None:
+            composite_image = cv2.imread(opt.composite_image)
+            composite_image = cv2.cvtColor(composite_image, cv2.COLOR_BGR2RGB)
+        self.composite_image = composite_image
+        if mask is None:
+            mask = cv2.imread(opt.mask)
+        mask = mask[:, :, 0].astype(np.float32) / 255.
+        self.mask = mask
+        self.torch_transforms = transforms.Compose([transforms.ToTensor(),
+                                                    transforms.Normalize([.5, .5, .5], [.5, .5, .5])])
+        self.INR_dataset = Implicit2DGenerator(opt, 'Val')
+        self.split_width_resolution = composite_image.shape[1] // opt.split_num
+        self.split_height_resolution = composite_image.shape[0] // opt.split_num
+        self.split_width_resolution = self.split_height_resolution = min(self.split_width_resolution,
+                                                                         self.split_height_resolution)
+        if self.split_width_resolution % 4 != 0:
+            self.split_width_resolution = self.split_width_resolution + (4 - self.split_width_resolution % 4)
+        if self.split_height_resolution % 4 != 0:
+            self.split_height_resolution = self.split_height_resolution + (4 - self.split_height_resolution % 4)
+        self.num_w = math.ceil(composite_image.shape[1] / self.split_width_resolution)
+        self.num_h = math.ceil(composite_image.shape[0] / self.split_height_resolution)
+        self.split_start_point = []
+        "Split the image into several parts."
+        for i in range(self.num_h):
+            for j in range(self.num_w):
+                if i == composite_image.shape[0] // self.split_height_resolution:
+                    if j == composite_image.shape[1] // self.split_width_resolution:
+                        self.split_start_point.append((composite_image.shape[0] - self.split_height_resolution,
+                                                       composite_image.shape[1] - self.split_width_resolution))
+                    else:
+                        self.split_start_point.append(
+                            (composite_image.shape[0] - self.split_height_resolution, j * self.split_width_resolution))
+                else:
+                    if j == composite_image.shape[1] // self.split_width_resolution:
+                        self.split_start_point.append(
+                            (i * self.split_height_resolution, composite_image.shape[1] - self.split_width_resolution))
+                    else:
+                        self.split_start_point.append(
+                            (i * self.split_height_resolution, j * self.split_width_resolution))
+        assert len(self.split_start_point) == self.num_w * self.num_h
+        print(
+            f"The image will be split into {self.num_h} pieces in height, and {self.num_w} pieces in width. Totally {self.num_h * self.num_w} patches.")
+        print(f"The final resolution of each patch is {self.split_height_resolution} x {self.split_width_resolution}")
+    def __len__(self):
+        return self.num_w * self.num_h
+    def __getitem__(self, idx):
+        composite_image = self.composite_image
+        mask = self.mask
+        full_coord = prepare_cooridinate_input(mask).transpose(1, 2, 0)
+        tmp_transform = albumentations.Compose([Resize(self.opt.base_size, self.opt.base_size)],
+                                               additional_targets={'object_mask': 'image'})
+        transform_out = tmp_transform(image=composite_image, object_mask=mask)
+        compos_list = [self.torch_transforms(transform_out['image'])]
+        mask_list = [
+            torchvision.transforms.ToTensor()(transform_out['object_mask'][..., np.newaxis].astype(np.float32))]
+        coord_map_list = []
+        if composite_image.shape[0] != self.split_height_resolution:
+            c_h = self.split_start_point[idx][0] / (composite_image.shape[0] - self.split_height_resolution)
+        else:
+            c_h = 0
+        if composite_image.shape[1] != self.split_width_resolution:
+            c_w = self.split_start_point[idx][1] / (composite_image.shape[1] - self.split_width_resolution)
+        else:
+            c_w = 0
+        transform_out, c_h, c_w = customRandomCrop([composite_image, mask, full_coord],
+                                                   self.split_height_resolution, self.split_width_resolution, c_h, c_w)
+        compos_list.append(self.torch_transforms(transform_out[0]))
+        mask_list.append(
+            torchvision.transforms.ToTensor()(transform_out[1][..., np.newaxis].astype(np.float32)))
+        coord_map_list.append(torchvision.transforms.ToTensor()(transform_out[2]))
+        coord_map_list.append(torchvision.transforms.ToTensor()(transform_out[2]))
+        for n in range(2):
+            tmp_comp = cv2.resize(composite_image, (
+                composite_image.shape[1] // 2 ** (n + 1), composite_image.shape[0] // 2 ** (n + 1)))
+            tmp_mask = cv2.resize(mask, (mask.shape[1] // 2 ** (n + 1), mask.shape[0] // 2 ** (n + 1)))
+            tmp_coord = prepare_cooridinate_input(tmp_mask).transpose(1, 2, 0)
+            transform_out, c_h, c_w = customRandomCrop([tmp_comp, tmp_mask, tmp_coord],
+                                                       self.split_height_resolution // 2 ** (n + 1),
+                                                       self.split_width_resolution // 2 ** (n + 1), c_h, c_w)
+            compos_list.append(self.torch_transforms(transform_out[0]))
+            mask_list.append(
+                torchvision.transforms.ToTensor()(transform_out[1][..., np.newaxis].astype(np.float32)))
+            coord_map_list.append(torchvision.transforms.ToTensor()(transform_out[2]))
+        out_comp = compos_list
+        out_mask = mask_list
+        out_coord = coord_map_list
+        fg_INR_coordinates, bg_INR_coordinates, fg_INR_RGB, fg_transfer_INR_RGB, bg_INR_RGB = self.INR_dataset.generator(
+            self.torch_transforms, transform_out[0], transform_out[0], mask)
+        return {
+            'composite_image': out_comp,
+            'mask': out_mask,
+            'coordinate_map': out_coord,
+            'composite_image0': out_comp[0],
+            'mask0': out_mask[0],
+            'coordinate_map0': out_coord[0],
+            'composite_image1': out_comp[1],
+            'mask1': out_mask[1],
+            'coordinate_map1': out_coord[1],
+            'composite_image2': out_comp[2],
+            'mask2': out_mask[2],
+            'coordinate_map2': out_coord[2],
+            'composite_image3': out_comp[3],
+            'mask3': out_mask[3],
+            'coordinate_map3': out_coord[3],
+            'fg_INR_coordinates': fg_INR_coordinates,
+            'bg_INR_coordinates': bg_INR_coordinates,
+            'fg_INR_RGB': fg_INR_RGB,
+            'fg_transfer_INR_RGB': fg_transfer_INR_RGB,
+            'bg_INR_RGB': bg_INR_RGB,
+            'start_point': self.split_start_point[idx],
+        }
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--split_num', type=int, default=4,
+                        help='How many pieces do you want to split an image width / height.')
+    parser.add_argument('--composite_image', type=str, default=r'./demo/demo_2k_composite.jpg',
+                        help='composite image path')
+    parser.add_argument('--mask', type=str, default=r'./demo/demo_2k_mask.jpg',
+                        help='mask path')
+    parser.add_argument('--save_path', type=str, default=r'./demo/',
+                        help='save path')
+    parser.add_argument('--workers', type=int, default=8,
+                        metavar='N', help='Dataloader threads.')
+    parser.add_argument('--batch_size', type=int, default=1,
+                        help='You can override model batch size by specify positive number.')
+    parser.add_argument('--device', type=str, default='cuda',
+                        help="Whether use cuda, 'cuda' or 'cpu'.")
+    parser.add_argument('--base_size', type=int, default=256,
+                        help='Base size. Resolution of the image input into the Encoder')
+    parser.add_argument('--input_size', type=int, default=256,
+                        help='Input size. Resolution of the image that want to be generated by the Decoder')
+    parser.add_argument('--INR_input_size', type=int, default=256,
+                        help='INR input size. Resolution of the image that want to be generated by the Decoder. '
+                             'Should be the same as `input_size`')
+    parser.add_argument('--INR_MLP_dim', type=int, default=32,
+                        help='Number of channels for INR linear layer.')
+    parser.add_argument('--LUT_dim', type=int, default=7,
+                        help='Dim of the output LUT. Refer to https://ieeexplore.ieee.org/abstract/document/9206076')
+    parser.add_argument('--activation', type=str, default='leakyrelu_pe',
+                        help='INR activation layer type: leakyrelu_pe, sine')
+    parser.add_argument('--pretrained', type=str,
+                        default=r'.\pretrained_models\Resolution_RAW_iHarmony4.pth',
+                        help='Pretrained weight path')
+    parser.add_argument('--param_factorize_dim', type=int,
+                        default=10,
+                        help='The intermediate dimensions of the factorization of the predicted MLP parameters. '
+                             'Refer to https://arxiv.org/abs/2011.12026')
+    parser.add_argument('--embedding_type', type=str,
+                        default="CIPS_embed",
+                        help='Which embedding_type to use.')
+    parser.add_argument('--INRDecode', action="store_false",
+                        help='Whether INR decoder. Set it to False if you want to test the baseline '
+                             '(https://github.com/SamsungLabs/image_harmonization)')
+    parser.add_argument('--isMoreINRInput', action="store_false",
+                        help='Whether to cat RGB and mask. See Section 3.4 in the paper.')
+    parser.add_argument('--hr_train', action="store_false",
+                        help='Whether use hr_train. See section 3.4 in the paper.')
+    parser.add_argument('--isFullRes', action="store_true",
+                        help='Whether for original resolution. See section 3.4 in the paper.')
+    opt = parser.parse_args()
+    return opt
+@torch.no_grad()
+def inference(model, opt, composite_image=None, mask=None):
+    model.eval()
+    "dataset here is actually consisted of several patches of a single image."
+    singledataset = single_image_dataset(opt, composite_image, mask)
+    single_data_loader = DataLoader(singledataset, opt.batch_size, shuffle=False, drop_last=False, pin_memory=True,
+                                    num_workers=opt.workers, persistent_workers=False if composite_image is not None else True)
+    "Init a pure black image with the same size as the input image."
+    init_img = np.zeros_like(singledataset.composite_image)
+    time_all = 0
+    for step, batch in tqdm.tqdm(enumerate(single_data_loader)):
+        composite_image = [batch[f'composite_image{name}'].to(opt.device) for name in range(4)]
+        mask = [batch[f'mask{name}'].to(opt.device) for name in range(4)]
+        coordinate_map = [batch[f'coordinate_map{name}'].to(opt.device) for name in range(4)]
+        start_points = batch['start_point']
+        if opt.batch_size == 1:
+            start_points = [torch.cat(start_points)]
+        fg_INR_coordinates = coordinate_map[1:]
+        try:
+            if step == 0:  # This is for CUDA Kernel Warm-up, or the first inference step will be quite slow.
+                fg_content_bg_appearance_construct, _, lut_transform_image = model(
+                    composite_image,
+                    mask,
+                    fg_INR_coordinates,
+                )
+            if opt.device == "cuda":
+                torch.cuda.reset_max_memory_allocated()
+                torch.cuda.reset_max_memory_cached()
+                start_time = time.time()
+                torch.cuda.synchronize()
+            fg_content_bg_appearance_construct, _, lut_transform_image = model(
+                composite_image,
+                mask,
+                fg_INR_coordinates,
+            )
+            if opt.device == "cuda":
+                torch.cuda.synchronize()
+                end_time = time.time()
+                end_max_memory = torch.cuda.max_memory_allocated() // 1024 ** 2
+                end_memory = torch.cuda.memory_allocated() // 1024 ** 2
+                print(f'GPU max memory usage: {end_max_memory} MB')
+                print(f'GPU memory usage: {end_memory} MB')
+                time_all += (end_time - start_time)
+            print(f'progress: {step} / {len(single_data_loader)}')
+        except:
+            raise Exception(
+                f'The image resolution is large. Please increase the `split_num` value. Your current set is {opt.split_num}')
+        "Assemble the every patch's harmonized result into the final whole image."
+        for id in range(len(fg_INR_coordinates[0])):
+            pred_fg_image = fg_content_bg_appearance_construct[-1][id]
+            pred_harmonized_image = pred_fg_image * (mask[1][id] > 100 / 255.) + composite_image[1][id] * (
+                ~(mask[1][id] > 100 / 255.))
+            pred_harmonized_tmp = cv2.cvtColor(
+                normalize(pred_harmonized_image.unsqueeze(0), opt, 'inv')[0].permute(1, 2, 0).cpu().mul_(255.).clamp_(
+                    0., 255.).numpy().astype(np.uint8), cv2.COLOR_RGB2BGR)
+            init_img[start_points[id][0]:start_points[id][0] + singledataset.split_height_resolution,
+            start_points[id][1]:start_points[id][1] + singledataset.split_width_resolution] = pred_harmonized_tmp
+    print(f'Inference time: {time_all}')
+    if opt.save_path is not None:
+        os.makedirs(opt.save_path, exist_ok=True)
+        cv2.imwrite(os.path.join(opt.save_path, "pred_harmonized_image.jpg"), init_img)
+    return init_img
+def main_process(opt, composite_image=None, mask=None):
+    cudnn.benchmark = True
+    model = build_model(opt).to(opt.device)
+    load_dict = torch.load(opt.pretrained)['model']
+    for k in load_dict.keys():
+        if k not in model.state_dict().keys():
+            print(f"Skip {k}")
+    model.load_state_dict(load_dict, strict=False)
+    return inference(model, opt, composite_image, mask)
+if __name__ == '__main__':
+    opt = parse_args()
+    opt.transform_mean = [.5, .5, .5]
+    opt.transform_var = [.5, .5, .5]
+    main_process(opt)

processing.py ADDED Viewed

	@@ -0,0 +1,308 @@

+import os
+import time
+import datetime
+import torch
+import torchvision
+from utils import misc, metrics
+best_psnr = 0
+def train(train_loader, val_loader, model, optimizer, scheduler, loss_fn, logger, opt):
+    total_step = opt.epochs * len(train_loader)
+    step_time_log = misc.AverageMeter()
+    loss_log = misc.AverageMeter(':6f')
+    loss_fg_content_bg_appearance_construct_log = misc.AverageMeter(':6f')
+    loss_lut_transform_image_log = misc.AverageMeter(':6f')
+    loss_lut_regularize_log = misc.AverageMeter(':6f')
+    start_epoch = 0
+    "Load pretrained checkpoints"
+    if opt.pretrained is not None:
+        logger.info(f"Load pretrained weight from {opt.pretrained}")
+        load_state = torch.load(opt.pretrained)
+        model = model.cpu()
+        model.load_state_dict(load_state['model'])
+        model = model.to(opt.device)
+        optimizer.load_state_dict(load_state['optimizer'])
+        scheduler.load_state_dict(load_state['scheduler'])
+        start_epoch = load_state['last_epoch'] + 1
+    for epoch in range(start_epoch, opt.epochs):
+        model.train()
+        time_ckp = time.time()
+        for step, batch in enumerate(train_loader):
+            current_step = epoch * len(train_loader) + step + 1
+            if opt.INRDecode and opt.hr_train:
+                "List with 4 elements: [Input to Encoder, three different resolutions' crop to INR Decoder]"
+                composite_image = [batch[f'composite_image{name}'].to(opt.device) for name in range(4)]
+                real_image = [batch[f'real_image{name}'].to(opt.device) for name in range(4)]
+                mask = [batch[f'mask{name}'].to(opt.device) for name in range(4)]
+                coordinate_map = [batch[f'coordinate_map{name}'].to(opt.device) for name in range(4)]
+                fg_INR_coordinates = coordinate_map[1:]
+            else:
+                composite_image = batch['composite_image'].to(opt.device)
+                real_image = batch['real_image'].to(opt.device)
+                mask = batch['mask'].to(opt.device)
+                fg_INR_coordinates = batch['fg_INR_coordinates'].to(opt.device)
+            fg_content_bg_appearance_construct, fit_lut3d, lut_transform_image = model(
+                composite_image, mask, fg_INR_coordinates)
+            if opt.INRDecode:
+                loss_fg_content_bg_appearance_construct = 0
+                """
+                    Our LRIP module requires three different resolution layers, thus here
+                    `loss_fg_content_bg_appearance_construct` is calculated in multiple layers.
+                    Besides, when leverage `hr_train`, i.e. use RSC strategy (See Section 3.4), the `real_image`
+                    and `mask` are list type, corresponding different resolutions' crop.
+                """
+                if opt.hr_train:
+                    for n in range(3):
+                        loss_fg_content_bg_appearance_construct += loss_fn['masked_mse'] \
+                            (fg_content_bg_appearance_construct[n], real_image[3 - n], mask[3 - n])
+                    loss_fg_content_bg_appearance_construct /= 3
+                    loss_lut_transform_image = loss_fn['masked_mse'](lut_transform_image, real_image[1], mask[1])
+                else:
+                    for n in range(3):
+                        loss_fg_content_bg_appearance_construct += loss_fn['MaskWeightedMSE'] \
+                            (fg_content_bg_appearance_construct[n],
+                             torchvision.transforms.Resize(opt.INR_input_size // 2 ** (3 - n - 1))(real_image),
+                             torchvision.transforms.Resize(opt.INR_input_size // 2 ** (3 - n - 1))(mask))
+                    loss_fg_content_bg_appearance_construct /= 3
+                    loss_lut_transform_image = loss_fn['masked_mse'](lut_transform_image, real_image, mask)
+                loss_lut_regularize = loss_fn['regularize_LUT'](fit_lut3d)
+            else:
+                loss_fg_content_bg_appearance_construct = 0
+                loss_lut_transform_image = loss_fn['masked_mse'](lut_transform_image, real_image, mask)
+                loss_lut_regularize = 0
+            loss = loss_fg_content_bg_appearance_construct + loss_lut_transform_image + loss_lut_regularize
+            optimizer.zero_grad()
+            loss.backward()
+            optimizer.step()
+            scheduler.step()
+            step_time_log.update(time.time() - time_ckp)
+            loss_fg_content_bg_appearance_construct_log.update(0 if isinstance(loss_fg_content_bg_appearance_construct,
+                                                                               int) else loss_fg_content_bg_appearance_construct.item())
+            loss_lut_transform_image_log.update(
+                0 if isinstance(loss_lut_transform_image, int) else loss_lut_transform_image.item())
+            loss_lut_regularize_log.update(0 if isinstance(loss_lut_regularize, int) else loss_lut_regularize.item())
+            loss_log.update(loss.item())
+            if current_step % opt.print_freq == 0:
+                remain_secs = (total_step - current_step) * step_time_log.avg
+                remain_time = datetime.timedelta(seconds=round(remain_secs))
+                finish_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(time.time() + remain_secs))
+                log_msg = f'Epoch: [{epoch}/{opt.epochs}]\t' \
+                          f'Step: [{step}/{len(train_loader)}]\t' \
+                          f'StepTime {step_time_log.val:.3f} ({step_time_log.avg:.3f})\t' \
+                          f'lr {optimizer.param_groups[0]["lr"]}\t' \
+                          f'Loss {loss_log.val:.4f} ({loss_log.avg:.4f})\t' \
+                          f'Loss_fg_bg_cons {loss_fg_content_bg_appearance_construct_log.val:.4f} ({loss_fg_content_bg_appearance_construct_log.avg:.4f})\t' \
+                          f'Loss_lut_trans {loss_lut_transform_image_log.val:.4f} ({loss_lut_transform_image_log.avg:.4f})\t' \
+                          f'Loss_lut_reg {loss_lut_regularize_log.val:.4f} ({loss_lut_regularize_log.avg:.4f})\t' \
+                          f'Remaining Time {remain_time} ({finish_time})'
+                logger.info(log_msg)
+                if opt.wandb:
+                    import wandb
+                    wandb.log(
+                        {'Train/Epoch': epoch, 'Train/lr': optimizer.param_groups[0]['lr'], 'Train/Step': current_step,
+                         'Train/Loss': loss_log.val,
+                         'Train/Loss_fg_bg_cons': loss_fg_content_bg_appearance_construct_log.val,
+                         'Train/Loss_lut_trans': loss_lut_transform_image_log.val,
+                         'Train/Loss_lut_reg': loss_lut_regularize_log.val,
+                         })
+            time_ckp = time.time()
+        state = {'model': model.state_dict(), 'optimizer': optimizer.state_dict(), 'last_epoch': epoch,
+                 'scheduler': scheduler.state_dict()}
+        """
+            As the validation of original resolution Harmonization will have no consistent resolution among images
+            (so fail to form a batch) and also may lead to out-of-memory problem when combined with training phase,
+            we here only save the model when `opt.isFullRes` is True, leaving the evaluation in `inference.py`.
+        """
+        if opt.isFullRes and opt.hr_train:
+            if epoch % 5 == 0:
+                torch.save(state, os.path.join(opt.save_path, f"epoch{epoch}.pth"))
+            else:
+                torch.save(state, os.path.join(opt.save_path, "last.pth"))
+        else:
+            val(val_loader, model, logger, opt, state)
+def val(val_loader, model, logger, opt, state):
+    global best_psnr
+    current_process = 10
+    model.eval()
+    metric_log = {
+        'HAdobe5k': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
+        'HCOCO': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
+        'Hday2night': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
+        'HFlickr': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
+        'All': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
+    }
+    lut_metric_log = {
+        'HAdobe5k': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
+        'HCOCO': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
+        'Hday2night': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
+        'HFlickr': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
+        'All': {'Samples': 0, 'MSE': 0, 'fMSE': 0, 'PSNR': 0, 'SSIM': 0},
+    }
+    for step, batch in enumerate(val_loader):
+        composite_image = batch['composite_image'].to(opt.device)
+        real_image = batch['real_image'].to(opt.device)
+        mask = batch['mask'].to(opt.device)
+        category = batch['category']
+        fg_INR_coordinates = batch['fg_INR_coordinates'].to(opt.device)
+        bg_INR_coordinates = batch['bg_INR_coordinates'].to(opt.device)
+        fg_transfer_INR_RGB = batch['fg_transfer_INR_RGB'].to(opt.device)
+        with torch.no_grad():
+            fg_content_bg_appearance_construct, _, lut_transform_image = model(
+                composite_image,
+                mask,
+                fg_INR_coordinates,
+                bg_INR_coordinates)
+        if opt.INRDecode:
+            pred_fg_image = fg_content_bg_appearance_construct[-1]
+        else:
+            pred_fg_image = None
+        fg_transfer_INR_RGB = misc.lin2img(fg_transfer_INR_RGB,
+                                           val_loader.dataset.INR_dataset.size) if fg_transfer_INR_RGB is not None else None
+        "For INR"
+        mask_INR = torchvision.transforms.Resize(opt.INR_input_size)(mask)
+        if not opt.INRDecode:
+            pred_harmonized_image = None
+        else:
+            pred_harmonized_image = pred_fg_image * (mask > 100 / 255.) + real_image * (~(mask > 100 / 255.))
+        lut_transform_image = lut_transform_image * (mask > 100 / 255.) + real_image * (~(mask > 100 / 255.))
+        "Save the output images. For every 10 epochs, save more results, otherwise, save little. Thus save storage."
+        if state['last_epoch'] % 10 == 0:
+            misc.visualize(real_image, composite_image, mask, pred_fg_image,
+                           pred_harmonized_image, lut_transform_image, opt, state['last_epoch'], show=False,
+                           wandb=opt.wandb, isAll=True, step=step)
+        elif step == 0:
+            misc.visualize(real_image, composite_image, mask, pred_fg_image,
+                           pred_harmonized_image, lut_transform_image, opt, state['last_epoch'], show=False,
+                           wandb=opt.wandb, step=step)
+        if opt.INRDecode:
+            mse, fmse, psnr, ssim = metrics.calc_metrics(misc.normalize(pred_harmonized_image, opt, 'inv'),
+                                                         misc.normalize(fg_transfer_INR_RGB, opt, 'inv'), mask_INR)
+        lut_mse, lut_fmse, lut_psnr, lut_ssim = metrics.calc_metrics(misc.normalize(lut_transform_image, opt, 'inv'),
+                                                                     misc.normalize(real_image, opt, 'inv'), mask)
+        for idx in range(len(category)):
+            if opt.INRDecode:
+                metric_log[category[idx]]['Samples'] += 1
+                metric_log[category[idx]]['MSE'] += mse[idx]
+                metric_log[category[idx]]['fMSE'] += fmse[idx]
+                metric_log[category[idx]]['PSNR'] += psnr[idx]
+                metric_log[category[idx]]['SSIM'] += ssim[idx]
+                metric_log['All']['Samples'] += 1
+                metric_log['All']['MSE'] += mse[idx]
+                metric_log['All']['fMSE'] += fmse[idx]
+                metric_log['All']['PSNR'] += psnr[idx]
+                metric_log['All']['SSIM'] += ssim[idx]
+            lut_metric_log[category[idx]]['Samples'] += 1
+            lut_metric_log[category[idx]]['MSE'] += lut_mse[idx]
+            lut_metric_log[category[idx]]['fMSE'] += lut_fmse[idx]
+            lut_metric_log[category[idx]]['PSNR'] += lut_psnr[idx]
+            lut_metric_log[category[idx]]['SSIM'] += lut_ssim[idx]
+            lut_metric_log['All']['Samples'] += 1
+            lut_metric_log['All']['MSE'] += lut_mse[idx]
+            lut_metric_log['All']['fMSE'] += lut_fmse[idx]
+            lut_metric_log['All']['PSNR'] += lut_psnr[idx]
+            lut_metric_log['All']['SSIM'] += lut_ssim[idx]
+        if (step + 1) / len(val_loader) * 100 >= current_process:
+            logger.info(f'Processing: {current_process}')
+            current_process += 10
+    logger.info('=========================')
+    for key in metric_log.keys():
+        if opt.INRDecode:
+            msg = f"{key}-'MSE': {metric_log[key]['MSE'] / metric_log[key]['Samples']:.2f}\n" \
+                  f"{key}-'fMSE': {metric_log[key]['fMSE'] / metric_log[key]['Samples']:.2f}\n" \
+                  f"{key}-'PSNR': {metric_log[key]['PSNR'] / metric_log[key]['Samples']:.2f}\n" \
+                  f"{key}-'SSIM': {metric_log[key]['SSIM'] / metric_log[key]['Samples']:.4f}\n" \
+                  f"{key}-'LUT_MSE': {lut_metric_log[key]['MSE'] / lut_metric_log[key]['Samples']:.2f}\n" \
+                  f"{key}-'LUT_fMSE': {lut_metric_log[key]['fMSE'] / lut_metric_log[key]['Samples']:.2f}\n" \
+                  f"{key}-'LUT_PSNR': {lut_metric_log[key]['PSNR'] / lut_metric_log[key]['Samples']:.2f}\n" \
+                  f"{key}-'LUT_SSIM': {lut_metric_log[key]['SSIM'] / lut_metric_log[key]['Samples']:.4f}\n"
+        else:
+            msg = f"{key}-'LUT_MSE': {lut_metric_log[key]['MSE'] / lut_metric_log[key]['Samples']:.2f}\n" \
+                  f"{key}-'LUT_fMSE': {lut_metric_log[key]['fMSE'] / lut_metric_log[key]['Samples']:.2f}\n" \
+                  f"{key}-'LUT_PSNR': {lut_metric_log[key]['PSNR'] / lut_metric_log[key]['Samples']:.2f}\n" \
+                  f"{key}-'LUT_SSIM': {lut_metric_log[key]['SSIM'] / lut_metric_log[key]['Samples']:.4f}\n"
+        logger.info(msg)
+        if opt.wandb:
+            import wandb
+            if opt.INRDecode:
+                wandb.log(
+                    {f'Val/{key}/Epoch': state['last_epoch'],
+                     f'Val/{key}/MSE': metric_log[key]['MSE'] / metric_log[key]['Samples'],
+                     f'Val/{key}/fMSE': metric_log[key]['fMSE'] / metric_log[key]['Samples'],
+                     f'Val/{key}/PSNR': metric_log[key]['PSNR'] / metric_log[key]['Samples'],
+                     f'Val/{key}/SSIM': metric_log[key]['SSIM'] / metric_log[key]['Samples'],
+                     f'Val/{key}/LUT_MSE': lut_metric_log[key]['MSE'] / lut_metric_log[key]['Samples'],
+                     f'Val/{key}/LUT_fMSE': lut_metric_log[key]['fMSE'] / lut_metric_log[key]['Samples'],
+                     f'Val/{key}/LUT_PSNR': lut_metric_log[key]['PSNR'] / lut_metric_log[key]['Samples'],
+                     f'Val/{key}/LUT_SSIM': lut_metric_log[key]['SSIM'] / lut_metric_log[key]['Samples']
+                     })
+            else:
+                wandb.log(
+                    {f'Val/{key}/Epoch': state['last_epoch'],
+                     f'Val/{key}/LUT_MSE': lut_metric_log[key]['MSE'] / lut_metric_log[key]['Samples'],
+                     f'Val/{key}/LUT_fMSE': lut_metric_log[key]['fMSE'] / lut_metric_log[key]['Samples'],
+                     f'Val/{key}/LUT_PSNR': lut_metric_log[key]['PSNR'] / lut_metric_log[key]['Samples'],
+                     f'Val/{key}/LUT_SSIM': lut_metric_log[key]['SSIM'] / lut_metric_log[key]['Samples']
+                     })
+    logger.info('=========================')
+    if not opt.INRDecode:
+        if lut_metric_log['All']['PSNR'] / lut_metric_log['All']['Samples'] > best_psnr:
+            logger.info("Best Save!")
+            best_psnr = lut_metric_log['All']['PSNR'] / lut_metric_log['All']['Samples']
+            torch.save(state, os.path.join(opt.save_path, "best.pth"))
+        else:
+            logger.info("Last Save!")
+            torch.save(state, os.path.join(opt.save_path, "last.pth"))
+    else:
+        if metric_log['All']['PSNR'] / metric_log['All']['Samples'] > best_psnr:
+            logger.info("Best Save!")
+            best_psnr = metric_log['All']['PSNR'] / metric_log['All']['Samples']
+            torch.save(state, os.path.join(opt.save_path, "best.pth"))
+        else:
+            logger.info("Last Save!")
+            torch.save(state, os.path.join(opt.save_path, "last.pth"))

requirements.txt ADDED Viewed

	@@ -0,0 +1,11 @@

+adamp==0.3.0
+albumentations==1.2.0
+numpy==1.21.2
+opencv_python==4.5.4.58
+opencv_python_headless==4.6.0.66
+pytorch_msssim==0.2.1
+scikit_image==0.18.3
+torch==1.12.0+cu113
+torchvision==0.13.0+cu113
+tqdm==4.62.2
+wandb==0.12.21

train.py ADDED Viewed

	@@ -0,0 +1,161 @@

+import os
+import argparse
+import albumentations
+from albumentations import HorizontalFlip, Resize, RandomResizedCrop
+import torch.backends.cudnn as cudnn
+import torchvision.transforms as transforms
+from torch.utils.data import DataLoader
+from torch.optim import lr_scheduler
+import processing
+from utils import build_loss, misc
+from model.build_model import build_model
+from datasets.build_dataset import dataset_generator
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--workers', type=int, default=8,
+                        metavar='N', help='Dataloader threads.')
+    parser.add_argument('--batch_size', type=int, default=16,
+                        help='You can override model batch size by specify positive number.')
+    parser.add_argument('--device', type=str, default='cuda',
+                        help="Whether use cuda, 'cuda' or 'cpu'.")
+    parser.add_argument('--epochs', type=int, default=60,
+                        help='Epochs number.')
+    parser.add_argument('--lr', type=int, default=1e-4,
+                        help='Learning rate.')
+    parser.add_argument('--save_path', type=str, default="./logs",
+                        help='Where to save logs and checkpoints.')
+    parser.add_argument('--dataset_path', type=str, default=r".\iHarmony4",
+                        help='Dataset path.')
+    parser.add_argument('--print_freq', type=int, default=100,
+                        help='Number of iterations then print.')
+    parser.add_argument('--base_size', type=int, default=256,
+                        help='Base size. Resolution of the image input into the Encoder')
+    parser.add_argument('--input_size', type=int, default=256,
+                        help='Input size. Resolution of the image that want to be generated by the Decoder')
+    parser.add_argument('--INR_input_size', type=int, default=256,
+                        help='INR input size. Resolution of the image that want to be generated by the Decoder. '
+                             'Should be the same as `input_size`')
+    parser.add_argument('--INR_MLP_dim', type=int, default=32,
+                        help='Number of channels for INR linear layer.')
+    parser.add_argument('--LUT_dim', type=int, default=7,
+                        help='Dim of the output LUT. Refer to https://ieeexplore.ieee.org/abstract/document/9206076')
+    parser.add_argument('--activation', type=str, default='leakyrelu_pe',
+                        help='INR activation layer type: leakyrelu_pe, sine')
+    parser.add_argument('--pretrained', type=str,
+                        default=None,
+                        help='Pretrained weight path')
+    parser.add_argument('--param_factorize_dim', type=int,
+                        default=10,
+                        help='The intermediate dimensions of the factorization of the predicted MLP parameters. '
+                             'Refer to https://arxiv.org/abs/2011.12026')
+    parser.add_argument('--embedding_type', type=str,
+                        default="CIPS_embed",
+                        help='Which embedding_type to use.')
+    parser.add_argument('--optim', type=str,
+                        default='adamw',
+                        help='Which optimizer to use.')
+    parser.add_argument('--INRDecode', action="store_false",
+                        help='Whether INR decoder. Set it to False if you want to test the baseline '
+                             '(https://github.com/SamsungLabs/image_harmonization)')
+    parser.add_argument('--isMoreINRInput', action="store_false",
+                        help='Whether to cat RGB and mask. See Section 3.4 in the paper.')
+    parser.add_argument('--hr_train', action="store_true",
+                        help='Whether use hr_train. See section 3.4 in the paper.')
+    parser.add_argument('--isFullRes', action="store_true",
+                        help='Whether for original resolution. See section 3.4 in the paper.')
+    opt = parser.parse_args()
+    opt.save_path = misc.increment_path(os.path.join(opt.save_path, "exp1"))
+    try:
+        import wandb
+        opt.wandb = True
+        wandb.init(config=opt, project="INR_Harmonization", name=os.path.basename(opt.save_path))
+    except:
+        opt.wandb = False
+    return opt
+def main_process(opt):
+    logger = misc.create_logger(os.path.join(opt.save_path, "log.txt"))
+    cudnn.benchmark = True
+    trainset_path = os.path.join(opt.dataset_path, "IHD_train.txt")
+    valset_path = os.path.join(opt.dataset_path, "IHD_test.txt")
+    opt.transform_mean = [.5, .5, .5]
+    opt.transform_var = [.5, .5, .5]
+    torch_transform = transforms.Compose([transforms.ToTensor(),
+                                          transforms.Normalize(opt.transform_mean, opt.transform_var)])
+    trainset_alb_transform = albumentations.Compose(
+        [
+            RandomResizedCrop(opt.input_size, opt.input_size, scale=(0.5, 1.0)),
+            HorizontalFlip()],
+        additional_targets={'real_image': 'image', 'object_mask': 'image'}
+    )
+    valset_alb_transform = albumentations.Compose([Resize(opt.input_size, opt.input_size)],
+                                                  additional_targets={'real_image': 'image', 'object_mask': 'image'})
+    trainset = dataset_generator(trainset_path, trainset_alb_transform, torch_transform, opt, mode='Train')
+    valset = dataset_generator(valset_path, valset_alb_transform, torch_transform, opt, mode='Val')
+    train_loader = DataLoader(trainset, opt.batch_size, shuffle=True, drop_last=True,
+                              pin_memory=True,
+                              num_workers=opt.workers, persistent_workers=True)
+    val_loader = DataLoader(valset, opt.batch_size, shuffle=False, drop_last=False, pin_memory=True,
+                            num_workers=opt.workers, persistent_workers=True)
+    model = build_model(opt).to(opt.device)
+    loss_fn = build_loss.loss_generator()
+    optimizer_params = {
+        'lr': opt.lr,
+        'weight_decay': 1e-2
+    }
+    optimizer = misc.get_optimizer(model, opt.optim, optimizer_params)
+    scheduler = lr_scheduler.OneCycleLR(optimizer, max_lr=opt.lr, total_steps=opt.epochs * len(train_loader),
+                                        pct_start=0.0)
+    processing.train(train_loader, val_loader, model, optimizer, scheduler, loss_fn, logger, opt)
+if __name__ == '__main__':
+    opt = parse_args()
+    os.makedirs(opt.save_path, exist_ok=True)
+    main_process(opt)