Spaces:

LB5
/

simswap55

Configuration error

App Files Files Community

LB5 commited on Mar 12, 2023

Commit

22b8701

•

1 Parent(s): e6a22e6

Upload 45 files

Browse files

Files changed (46) hide show

.gitattributes +1 -0
Docs/pics/img_mask_blur_21.jpg +0 -0
Docs/pics/img_mask_blur_41.jpg +0 -0
Docs/pics/img_mask_blur_61.jpg +0 -0
Docs/pics/img_mask_erode_0.jpg +0 -0
Docs/pics/img_mask_erode_20.jpg +0 -0
Docs/pics/img_mask_erode_40.jpg +0 -0
README.md +178 -12
app.py +74 -0
app_web.py +160 -0
configs/run_image.yaml +35 -0
configs/run_image_specific.yaml +35 -0
configs/run_video.yaml +36 -0
configs/run_video_specific.yaml +36 -0
demo_file/Iron_man.jpg +0 -0
demo_file/multi_people.jpg +0 -0
demo_file/multi_people_1080p.mp4 +3 -0
demo_file/multispecific/DST_01.jpg +0 -0
demo_file/multispecific/DST_02.jpg +0 -0
demo_file/multispecific/DST_03.jpg +0 -0
demo_file/multispecific/SRC_01.png +0 -0
demo_file/multispecific/SRC_02.png +0 -0
demo_file/multispecific/SRC_03.png +0 -0
demo_file/specific1.png +0 -0
demo_file/specific2.png +0 -0
demo_file/specific3.png +0 -0
requirements.txt +9 -0
src/Blend/blend.py +12 -0
src/DataManager/ImageDataManager.py +42 -0
src/DataManager/VideoDataManager.py +73 -0
src/DataManager/base.py +16 -0
src/DataManager/utils.py +12 -0
src/FaceAlign/face_align.py +244 -0
src/FaceDetector/face_detector.py +37 -0
src/FaceId/faceid.py +50 -0
src/Generator/fs_networks_512.py +277 -0
src/Generator/fs_networks_fix.py +245 -0
src/Misc/types.py +11 -0
src/Misc/utils.py +28 -0
src/PostProcess/GFPGAN/gfpgan.py +341 -0
src/PostProcess/GFPGAN/stylegan2.py +351 -0
src/PostProcess/ParsingModel/model.py +323 -0
src/PostProcess/ParsingModel/resnet.py +109 -0
src/PostProcess/utils.py +122 -0
src/model_loader.py +106 -0
src/simswap.py +322 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 simswap-inference-pytorch-main/demo_file/multi_people_1080p.mp4 filter=lfs diff=lfs merge=lfs -text

 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 simswap-inference-pytorch-main/demo_file/multi_people_1080p.mp4 filter=lfs diff=lfs merge=lfs -text
+demo_file/multi_people_1080p.mp4 filter=lfs diff=lfs merge=lfs -text

Docs/pics/img_mask_blur_21.jpg ADDED Viewed

Docs/pics/img_mask_blur_41.jpg ADDED Viewed

Docs/pics/img_mask_blur_61.jpg ADDED Viewed

Docs/pics/img_mask_erode_0.jpg ADDED Viewed

Docs/pics/img_mask_erode_20.jpg ADDED Viewed

Docs/pics/img_mask_erode_40.jpg ADDED Viewed

README.md CHANGED Viewed

@@ -1,12 +1,178 @@
----
-title: Simswap55
-emoji: 🌖
-colorFrom: blue
-colorTo: yellow
-sdk: gradio
-sdk_version: 3.20.1
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Unofficial Pytorch implementation (**inference only**) of the SimSwap: An Efficient Framework For High Fidelity Face Swapping
+## Updates
+- improved performance (up to 40% in some scenarios, it depends on frame resolution and number of swaps per frame).
+- fixed a problem with overlapped areas from close faces (https://github.com/mike9251/simswap-inference-pytorch/issues/21)
+- added support for using GFPGAN model as an additional post-processing step to improve final image quality
+- added a toy gui app. Might be useful to understand how different pipeline settings affect output
+## Attention
+***This project is for technical and academic use only. Please do not apply it to illegal and unethical scenarios.***
+***In the event of violation of the legal and ethical requirements of the user's country or region, this code repository is exempt from liability.***
+## Preparation
+### Installation
+```
+# clone project
+git clone https://github.com/mike9251/simswap-inference-pytorch
+cd simswap-inference-pytorch
+# [OPTIONAL] create conda environment
+conda create -n myenv python=3.9
+conda activate myenv
+# install pytorch and torchvision according to instructions
+# https://pytorch.org/get-started/
+# install requirements
+pip install -r requirements.txt
+```
+### Important
+Face detection will be performed on CPU. To run it on GPU you need to install onnx gpu runtime:
+```pip install onnxruntime-gpu==1.11.1```
+and modify one line of code in ```...Anaconda3\envs\myenv\Lib\site-packages\insightface\model_zoo\model_zoo.py```
+Here, instead of passing **None** as the second argument to the onnx inference session
+```angular2html
+class ModelRouter:
+    def __init__(self, onnx_file):
+        self.onnx_file = onnx_file
+    def get_model(self):
+        session = onnxruntime.InferenceSession(self.onnx_file, None)
+        input_cfg = session.get_inputs()[0]
+```
+pass a list of providers
+```angular2html
+class ModelRouter:
+    def __init__(self, onnx_file):
+        self.onnx_file = onnx_file
+    def get_model(self):
+        session = onnxruntime.InferenceSession(self.onnx_file, providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
+        input_cfg = session.get_inputs()[0]
+```
+Otherwise simply use CPU onnx runtime with only a minor performance drop.
+### Weights
+#### Weights for all models get downloaded automatically.
+You can also download weights manually and put inside `weights` folder:
+- weights/<a href="https://github.com/mike9251/simswap-inference-pytorch/releases/download/weights/face_detector_scrfd_10g_bnkps.onnx">face_detector_scrfd_10g_bnkps.onnx</a>
+- weights/<a href="https://github.com/mike9251/simswap-inference-pytorch/releases/download/weights/arcface_net.jit">arcface_net.jit</a>
+- weights/<a href="https://github.com/mike9251/simswap-inference-pytorch/releases/download/weights/parsing_model_79999_iter.pth">79999_iter.pth</a>
+- weights/<a href="https://github.com/mike9251/simswap-inference-pytorch/releases/download/weights/simswap_224_latest_net_G.pth">simswap_224_latest_net_G.pth</a> - official 224x224 model
+- weights/<a href="https://github.com/mike9251/simswap-inference-pytorch/releases/download/weights/simswap_512_390000_net_G.pth">simswap_512_390000_net_G.pth</a> - unofficial 512x512 model (I took it <a href="https://github.com/neuralchen/SimSwap/issues/255">here</a>).
+- weights/<a href="https://github.com/mike9251/simswap-inference-pytorch/releases/download/v1.1/GFPGANv1.4_ema.pth">GFPGANv1.4_ema.pth</a>
+- weights/<a href="https://github.com/mike9251/simswap-inference-pytorch/releases/download/v1.2/blend_module.jit">blend_module.jit</a>
+## Inference
+### Web App
+```angular2html
+streamlit run app_web.py
+```
+### Command line App
+This repository supports inference in several modes, which can be easily configured with config files in the **configs** folder.
+- **replace all faces on a target image / folder with images**
+```angular2html
+python app.py --config-name=run_image.yaml
+```
+- **replace all faces on a video**
+```angular2html
+python app.py --config-name=run_video.yaml
+```
+- **replace a specific face on a target image / folder with images**
+```angular2html
+python app.py --config-name=run_image_specific.yaml
+```
+- **replace a specific face on a video**
+```angular2html
+python app.py --config-name=run_video_specific.yaml
+```
+Config files contain two main parts:
+- **data**
+  - *id_image* - source image, identity of this person will be transferred.
+  - *att_image* - target image, attributes of the person on this image will be mixed with the person's identity from the source image. Here you can also specify a folder with multiple images - identity translation will be applied to all images in the folder.
+  - *specific_id_image* - a specific person on the *att_image* you would like to replace, leaving others untouched (if there's any other person).
+  - *att_video* - the same as *att_image*
+  - *clean_work_dir* - whether remove temp folder with images or not (for video configs only).
+- **pipeline**
+  - *face_detector_weights* - path to the weights file OR an empty string ("") for automatic weights downloading.
+  - *face_id_weights* - path to the weights file OR an empty string ("") for automatic weights downloading.
+  - *parsing_model_weights* - path to the weights file OR an empty string ("") for automatic weights downloading.
+  - *simswap_weights* - path to the weights file OR an empty string ("") for automatic weights downloading.
+  - *gfpgan_weights* - path to the weights file OR an empty string ("") for automatic weights downloading.
+  - *device* - whether you want to run the application using GPU or CPU.
+  - *crop_size* - size of images SimSwap models works with.
+  - *checkpoint_type* - the official model works with 224x224 crops and has different pre/post processings (imagenet like). Latest official repository allows you to train your own models, but the architecture and pre/post processings are slightly different (1. removed Tanh from the last layer; 2. normalization to [0...1] range). **If you run the official 224x224 model then set this parameter to "official_224", otherwise "none".**
+  - *face_alignment_type* - affects reference face key points coordinates. **Possible values are "ffhq" and "none". Try both of them to see which one works better for your data.**
+  - *smooth_mask_kernel_size* - a non-zero value. It's used for the post-processing mask size attenuation. You might want to play with this parameter.
+  - *smooth_mask_iter* - a non-zero value. The number of times a face mask is smoothed.
+  - *smooth_mask_threshold* - controls the face mask saturation. Valid values are in range [0.0...1.0]. Tune this parameter if there are artifacts around swapped faces.
+  - *face_detector_threshold* - values in range [0.0...1.0]. Higher value reduces probability of FP detections but increases the probability of FN.
+  - *specific_latent_match_threshold* - values in range [0.0...inf]. Usually takes small values around 0.05.
+  - *enhance_output* - whether to apply GFPGAN model or not as a post-processing step.
+### Overriding parameters with CMD
+Every parameter in a config file can be overridden by specifying it directly with CMD. For example:
+```angular2html
+python app.py --config-name=run_image.yaml data.specific_id_image="path/to/the/image" pipeline.erosion_kernel_size=20
+```
+## Video
+<details>
+<summary><b>Official 224x224 model, face alignment "none"</b></summary>
+[![Video](https://i.imgur.com/iCujdRB.jpg)](https://vimeo.com/728346715)
+</details>
+<details>
+<summary><b>Official 224x224 model, face alignment "ffhq"</b></summary>
+[![Video](https://i.imgur.com/48hjJO4.jpg)](https://vimeo.com/728348520)
+</details>
+<details>
+<summary><b>Unofficial 512x512 model, face alignment "none"</b></summary>
+[![Video](https://i.imgur.com/rRltD4U.jpg)](https://vimeo.com/728346542)
+</details>
+<details>
+<summary><b>Unofficial 512x512 model, face alignment "ffhq"</b></summary>
+[![Video](https://i.imgur.com/gFkpyXS.jpg)](https://vimeo.com/728349219)
+</details>
+## License
+For academic and non-commercial use only.The whole project is under the CC-BY-NC 4.0 license. See [LICENSE](https://github.com/neuralchen/SimSwap/blob/main/LICENSE) for additional details.
+## Acknowledgements
+<!--ts-->
+* [SimSwap](https://github.com/neuralchen/SimSwap)
+* [Insightface](https://github.com/deepinsight/insightface)
+* [Face-parsing.PyTorch](https://github.com/zllrunning/face-parsing.PyTorch)
+* [BiSeNet](https://github.com/CoinCheung/BiSeNet)
+* [GFPGAN](https://github.com/TencentARC/GFPGAN)
+<!--te-->

app.py ADDED Viewed

	@@ -0,0 +1,74 @@

+from pathlib import Path
+from typing import Optional
+from tqdm import tqdm
+import hydra
+from omegaconf import DictConfig
+import numpy as np
+from src.simswap import SimSwap
+from src.DataManager.ImageDataManager import ImageDataManager
+from src.DataManager.VideoDataManager import VideoDataManager
+from src.DataManager.utils import imread_rgb
+class Application:
+    def __init__(self, config: DictConfig):
+        id_image_path = Path(config.data.id_image)
+        specific_id_image_path = Path(config.data.specific_id_image)
+        att_image_path = Path(config.data.att_image)
+        att_video_path = Path(config.data.att_video)
+        output_dir = Path(config.data.output_dir)
+        assert id_image_path.exists(), f"Can't find {id_image_path} file!"
+        self.id_image: Optional[np.ndarray] = imread_rgb(id_image_path)
+        self.specific_id_image: Optional[np.ndarray] = (
+            imread_rgb(specific_id_image_path)
+            if specific_id_image_path and specific_id_image_path.is_file()
+            else None
+        )
+        self.att_image: Optional[ImageDataManager] = None
+        if att_image_path and (att_image_path.is_file() or att_image_path.is_dir()):
+            self.att_image: Optional[ImageDataManager] = ImageDataManager(
+                src_data=att_image_path, output_dir=output_dir
+            )
+        self.att_video: Optional[VideoDataManager] = None
+        if att_video_path and att_video_path.is_file():
+            self.att_video: Optional[VideoDataManager] = VideoDataManager(
+                src_data=att_video_path, output_dir=output_dir, clean_work_dir=config.data.clean_work_dir
+            )
+        assert not (self.att_video and self.att_image), "Only one attribute source can be used!"
+        self.data_manager = self.att_video if self.att_video else self.att_image
+        self.model = SimSwap(
+            config=config.pipeline,
+            id_image=self.id_image,
+            specific_image=self.specific_id_image,
+        )
+    def run(self):
+        for _ in tqdm(range(len(self.data_manager))):
+            att_img = self.data_manager.get()
+            output = self.model(att_img)
+            self.data_manager.save(output)
+@hydra.main(config_path="configs/", config_name="run_image.yaml")
+def main(config: DictConfig):
+    app = Application(config)
+    app.run()
+if __name__ == "__main__":
+    main()

app_web.py ADDED Viewed

	@@ -0,0 +1,160 @@

+import streamlit as st
+from PIL import Image
+from io import BytesIO
+from collections import namedtuple
+import numpy as np
+from src.simswap import SimSwap
+def run(model):
+    id_image = None
+    attr_image = None
+    specific_image = None
+    output = None
+    def get_np_image(file):
+        return np.array(Image.open(file))[:, :, :3]
+    with st.sidebar:
+        uploaded_file = st.file_uploader("Select an ID image")
+        if uploaded_file is not None:
+            id_image = get_np_image(uploaded_file)
+        uploaded_file = st.file_uploader("Select an Attribute image")
+        if uploaded_file is not None:
+            attr_image = get_np_image(uploaded_file)
+        uploaded_file = st.file_uploader("Select a specific person image (Optional)")
+        if uploaded_file is not None:
+            specific_image = get_np_image(uploaded_file)
+        face_alignment_type = st.radio("Face alignment type:", ("none", "ffhq"))
+        enhance_output = st.radio("Enhance output:", ("yes", "no"))
+        smooth_mask_iter = st.slider(
+            label="smooth_mask_iter", min_value=1, max_value=60, step=1, value=7
+        )
+        smooth_mask_kernel_size = st.slider(
+            label="smooth_mask_kernel_size", min_value=1, max_value=61, step=2, value=17
+        )
+        smooth_mask_threshold = st.slider(label="smooth_mask_threshold", min_value=0.01, max_value=1.0, step=0.01, value=0.9)
+        specific_latent_match_threshold = st.slider(
+            label="specific_latent_match_threshold",
+            min_value=0.0,
+            max_value=10.0,
+            value=0.05,
+        )
+    num_cols = sum(
+        (id_image is not None, attr_image is not None, specific_image is not None)
+    )
+    cols = st.columns(num_cols if num_cols > 0 else 1)
+    i = 0
+    if id_image is not None:
+        with cols[i]:
+            i += 1
+            st.header("ID image")
+            st.image(id_image)
+    if attr_image is not None:
+        with cols[i]:
+            i += 1
+            st.header("Attribute image")
+            st.image(attr_image)
+    if specific_image is not None:
+        with cols[i]:
+            st.header("Specific image")
+            st.image(specific_image)
+    if id_image is not None and attr_image is not None:
+        model.set_face_alignment_type(face_alignment_type)
+        model.set_smooth_mask_iter(smooth_mask_iter)
+        model.set_smooth_mask_kernel_size(smooth_mask_kernel_size)
+        model.set_smooth_mask_threshold(smooth_mask_threshold)
+        model.set_specific_latent_match_threshold(specific_latent_match_threshold)
+        model.enhance_output = True if enhance_output == "yes" else False
+        model.specific_latent = None
+        model.specific_id_image = specific_image if specific_image is not None else None
+        model.id_latent = None
+        model.id_image = id_image
+        output = model(attr_image)
+    if output is not None:
+        with st.container():
+            st.header("SimSwap output")
+            st.image(output)
+            output_to_download = Image.fromarray(output.astype("uint8"), "RGB")
+            buf = BytesIO()
+            output_to_download.save(buf, format="JPEG")
+            st.download_button(
+                label="Download",
+                data=buf.getvalue(),
+                file_name="output.jpg",
+                mime="image/jpeg",
+            )
+@st.cache(allow_output_mutation=True)
+def load_model(config):
+    return SimSwap(
+        config=config,
+        id_image=None,
+        specific_image=None,
+    )
+# TODO: remove it and use config files from 'configs'
+Config = namedtuple(
+    "Config",
+    "face_detector_weights"
+    + " face_id_weights"
+    + " parsing_model_weights"
+    + " simswap_weights"
+    + " gfpgan_weights"
+    + " blend_module_weights"
+    + " device"
+    + " crop_size"
+    + " checkpoint_type"
+    + " face_alignment_type"
+    + " smooth_mask_iter"
+    + " smooth_mask_kernel_size"
+    + " smooth_mask_threshold"
+    + " face_detector_threshold"
+    + " specific_latent_match_threshold"
+    + " enhance_output",
+)
+if __name__ == "__main__":
+    config = Config(
+        face_detector_weights="weights/scrfd_10g_bnkps.onnx",
+        face_id_weights="weights/arcface_net.jit",
+        parsing_model_weights="weights/79999_iter.pth",
+        simswap_weights="weights/latest_net_G.pth",
+        gfpgan_weights="weights/GFPGANv1.4_ema.pth",
+        blend_module_weights="weights/blend.jit",
+        device="cuda",
+        crop_size=224,
+        checkpoint_type="official_224",
+        face_alignment_type="none",
+        smooth_mask_iter=7,
+        smooth_mask_kernel_size=17,
+        smooth_mask_threshold=0.9,
+        face_detector_threshold=0.6,
+        specific_latent_match_threshold=0.05,
+        enhance_output=True
+    )
+    model = load_model(config)
+    run(model)

configs/run_image.yaml ADDED Viewed

	@@ -0,0 +1,35 @@

+data:
+  id_image: "${hydra:runtime.cwd}/demo_file/Iron_man.jpg"
+  att_image: "${hydra:runtime.cwd}/demo_file/multi_people.jpg"
+  specific_id_image: "none"
+  att_video: "none"
+  output_dir: ${hydra:runtime.cwd}/output
+pipeline:
+  face_detector_weights: "${hydra:runtime.cwd}/weights/face_detector_scrfd_10g_bnkps.onnx"
+  face_id_weights: "${hydra:runtime.cwd}/weights/arcface_net.jit"
+  parsing_model_weights: "${hydra:runtime.cwd}/weights/79999_iter.pth"
+  simswap_weights: "${hydra:runtime.cwd}/weights/simswap_224_latest_net_G.pth"
+  gfpgan_weights: "${hydra:runtime.cwd}/weights/GFPGANv1.4_ema.pth"
+  blend_module_weights: "${hydra:runtime.cwd}/weights/blend_module.jit"
+  device: "cuda"
+  crop_size: 224
+  # it seems that the official 224 checkpoint works better with 'none' face alignment type
+  checkpoint_type: "official_224" #"none"
+  face_alignment_type: "none" #"ffhq"
+  smooth_mask_iter: 7
+  smooth_mask_kernel_size: 17
+  smooth_mask_threshold: 0.9
+  face_detector_threshold: 0.6
+  specific_latent_match_threshold: 0.05
+  enhance_output: True
+defaults:
+  - _self_
+  - override hydra/hydra_logging: disabled
+  - override hydra/job_logging: disabled
+hydra:
+  output_subdir: null
+  run:
+    dir: .

configs/run_image_specific.yaml ADDED Viewed

	@@ -0,0 +1,35 @@

+data:
+  id_image: "${hydra:runtime.cwd}/demo_file/Iron_man.jpg"
+  att_image: "${hydra:runtime.cwd}/demo_file/multi_people.jpg"
+  specific_id_image: "${hydra:runtime.cwd}/demo_file/specific1.png"
+  att_video: "none"
+  output_dir: ${hydra:runtime.cwd}/output
+pipeline:
+  face_detector_weights: "${hydra:runtime.cwd}/weights/face_detector_scrfd_10g_bnkps.onnx"
+  face_id_weights: "${hydra:runtime.cwd}/weights/arcface_net.jit"
+  parsing_model_weights: "${hydra:runtime.cwd}/weights/79999_iter.pth"
+  simswap_weights: "${hydra:runtime.cwd}/weights/simswap_224_latest_net_G.pth"
+  gfpgan_weights: "${hydra:runtime.cwd}/weights/GFPGANv1.4_ema.pth"
+  blend_module_weights: "${hydra:runtime.cwd}/weights/blend_module.jit"
+  device: "cuda"
+  crop_size: 224
+  # it seems that the official 224 checkpoint works better with 'none' face alignment type
+  checkpoint_type: "official_224" #"none"
+  face_alignment_type: "none" #"ffhq"
+  smooth_mask_iter: 7
+  smooth_mask_kernel_size: 17
+  smooth_mask_threshold: 0.9
+  face_detector_threshold: 0.6
+  specific_latent_match_threshold: 0.05
+  enhance_output: True
+defaults:
+  - _self_
+  - override hydra/hydra_logging: disabled
+  - override hydra/job_logging: disabled
+hydra:
+  output_subdir: null
+  run:
+    dir: .

configs/run_video.yaml ADDED Viewed

	@@ -0,0 +1,36 @@

+data:
+  id_image: "${hydra:runtime.cwd}/demo_file/Iron_man.jpg"
+  att_image: "none"
+  specific_id_image: "none"
+  att_video: "${hydra:runtime.cwd}/demo_file/multi_people_1080p.mp4"
+  output_dir: ${hydra:runtime.cwd}/output
+  clean_work_dir: True
+pipeline:
+  face_detector_weights: "${hydra:runtime.cwd}/weights/face_detector_scrfd_10g_bnkps.onnx"
+  face_id_weights: "${hydra:runtime.cwd}/weights/arcface_net.jit"
+  parsing_model_weights: "${hydra:runtime.cwd}/weights/79999_iter.pth"
+  simswap_weights: "${hydra:runtime.cwd}/weights/simswap_224_latest_net_G.pth"
+  gfpgan_weights: "${hydra:runtime.cwd}/weights/GFPGANv1.4_ema.pth"
+  blend_module_weights: "${hydra:runtime.cwd}/weights/blend_module.jit"
+  device: "cuda"
+  crop_size: 224
+  # it seems that the official 224 checkpoint works better with 'none' face alignment type
+  checkpoint_type: "official_224" #"none"
+  face_alignment_type: "none" #"ffhq"
+  smooth_mask_iter: 7
+  smooth_mask_kernel_size: 17
+  smooth_mask_threshold: 0.9
+  face_detector_threshold: 0.6
+  specific_latent_match_threshold: 0.05
+  enhance_output: True
+defaults:
+  - _self_
+  - override hydra/hydra_logging: disabled
+  - override hydra/job_logging: disabled
+hydra:
+  output_subdir: null
+  run:
+    dir: .

configs/run_video_specific.yaml ADDED Viewed

	@@ -0,0 +1,36 @@

+data:
+  id_image: "${hydra:runtime.cwd}/demo_file/Iron_man.jpg"
+  att_image: "none"
+  specific_id_image: "${hydra:runtime.cwd}/demo_file/specific1.png"
+  att_video: "${hydra:runtime.cwd}/demo_file/multi_people_1080p.mp4"
+  output_dir: ${hydra:runtime.cwd}/output
+  clean_work_dir: True
+pipeline:
+  face_detector_weights: "${hydra:runtime.cwd}/weights/face_detector_scrfd_10g_bnkps.onnx"
+  face_id_weights: "${hydra:runtime.cwd}/weights/arcface_net.jit"
+  parsing_model_weights: "${hydra:runtime.cwd}/weights/79999_iter.pth"
+  simswap_weights: "${hydra:runtime.cwd}/weights/simswap_224_latest_net_G.pth"
+  gfpgan_weights: "${hydra:runtime.cwd}/weights/GFPGANv1.4_ema.pth"
+  blend_module_weights: "${hydra:runtime.cwd}/weights/blend_module.jit"
+  device: "cuda"
+  crop_size: 224
+  # it seems that the official 224 checkpoint works better with 'none' face alignment type
+  checkpoint_type: "official_224" #"none"
+  face_alignment_type: "none" #"ffhq"
+  smooth_mask_iter: 7
+  smooth_mask_kernel_size: 17
+  smooth_mask_threshold: 0.9
+  face_detector_threshold: 0.6
+  specific_latent_match_threshold: 0.05
+  enhance_output: True
+defaults:
+  - _self_
+  - override hydra/hydra_logging: disabled
+  - override hydra/job_logging: disabled
+hydra:
+  output_subdir: null
+  run:
+    dir: .

demo_file/Iron_man.jpg ADDED Viewed

demo_file/multi_people.jpg ADDED Viewed

demo_file/multi_people_1080p.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:97fe960cc03abac34509ec69a68c7b75f2ca1325aea353456411fe7569d978e1
+size 8735410

demo_file/multispecific/DST_01.jpg ADDED Viewed

demo_file/multispecific/DST_02.jpg ADDED Viewed

demo_file/multispecific/DST_03.jpg ADDED Viewed

demo_file/multispecific/SRC_01.png ADDED Viewed

demo_file/multispecific/SRC_02.png ADDED Viewed

demo_file/multispecific/SRC_03.png ADDED Viewed

demo_file/specific1.png ADDED Viewed

demo_file/specific2.png ADDED Viewed

demo_file/specific3.png ADDED Viewed

requirements.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+hydra-core>=1.1.0
+insightface==0.2.1
+kornia==0.6.5
+moviepy==1.0.3
+onnx==1.12.0
+onnxruntime==1.11.1
+opencv-python==4.6.0.66
+tqdm==4.64.0
+streamlit==1.14.0

src/Blend/blend.py ADDED Viewed

	@@ -0,0 +1,12 @@

+import torch
+import torch.nn as nn
+class BlendModule(nn.Module):
+    def __init__(self, model_path, device):
+        super().__init__()
+        self.model = torch.jit.load(model_path).to(device)
+    def forward(self, swap, mask, att_img):
+        return self.model(swap, mask, att_img)

src/DataManager/ImageDataManager.py ADDED Viewed

	@@ -0,0 +1,42 @@

+from src.DataManager.base import BaseDataManager
+from src.DataManager.utils import imread_rgb, imwrite_rgb
+import numpy as np
+from pathlib import Path
+class ImageDataManager(BaseDataManager):
+    def __init__(self, src_data: Path, output_dir: Path):
+        self.output_dir: Path = output_dir
+        self.output_dir.mkdir(exist_ok=True)
+        self.output_dir = output_dir / "img"
+        self.output_dir.mkdir(exist_ok=True)
+        self.data_paths = []
+        if src_data.is_file():
+            self.data_paths.append(src_data)
+        elif src_data.is_dir():
+            self.data_paths = (
+                list(src_data.glob("*.jpg"))
+                + list(src_data.glob("*.jpeg"))
+                + list(src_data.glob("*.png"))
+            )
+        assert len(self.data_paths), "Data must be supplied!"
+        self.data_paths_iter = iter(self.data_paths)
+        self.last_idx = -1
+    def __len__(self):
+        return len(self.data_paths)
+    def get(self) -> np.ndarray:
+        img_path = next(self.data_paths_iter)
+        self.last_idx += 1
+        return imread_rgb(img_path)
+    def save(self, img: np.ndarray):
+        filename = "swap_" + Path(self.data_paths[self.last_idx]).name
+        imwrite_rgb(self.output_dir / filename, img)

src/DataManager/VideoDataManager.py ADDED Viewed

	@@ -0,0 +1,73 @@

+from src.DataManager.base import BaseDataManager
+from src.DataManager.utils import imwrite_rgb
+import cv2
+import numpy as np
+from pathlib import Path
+import shutil
+from typing import Optional, Union
+from moviepy.editor import AudioFileClip, VideoFileClip
+from moviepy.video.io.ImageSequenceClip import ImageSequenceClip
+class VideoDataManager(BaseDataManager):
+    def __init__(self, src_data: Path, output_dir: Path, clean_work_dir: bool = False):
+        self.video_handle: Optional[cv2.VideoCapture] = None
+        self.audio_handle: Optional[AudioFileClip] = None
+        self.output_dir = output_dir
+        self.output_img_dir = output_dir / "img"
+        self.output_dir.mkdir(exist_ok=True)
+        self.output_img_dir.mkdir(exist_ok=True)
+        self.video_name = None
+        self.clean_work_dir = clean_work_dir
+        if src_data.is_file():
+            self.video_name = "swap_" + src_data.name
+            if VideoFileClip(str(src_data)).audio is not None:
+                self.audio_handle = AudioFileClip(str(src_data))
+            self.video_handle = cv2.VideoCapture(str(src_data))
+            self.video_handle.set(cv2.CAP_PROP_POS_FRAMES, 0)
+            self.frame_count = int(self.video_handle.get(cv2.CAP_PROP_FRAME_COUNT))
+            self.fps = self.video_handle.get(cv2.CAP_PROP_FPS)
+        self.last_idx = -1
+        assert self.video_handle, "Video file must be specified!"
+    def __len__(self):
+        return self.frame_count
+    def get(self) -> np.ndarray:
+        img: Union[None, np.ndarray] = None
+        while img is None and self.last_idx < self.frame_count:
+            status, img = self.video_handle.read()
+            self.last_idx += 1
+        if img is not None:
+            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+        return img
+    def save(self, img: np.ndarray):
+        filename = "frame_{:0>7d}.jpg".format(self.last_idx)
+        imwrite_rgb(self.output_img_dir / filename, img)
+        if (self.frame_count - 1) == self.last_idx:
+            self._close()
+    def _close(self):
+        image_filenames = [str(x) for x in sorted(self.output_img_dir.glob("*.jpg"))]
+        clip = ImageSequenceClip(image_filenames, fps=self.fps)
+        if self.audio_handle is not None:
+            clip = clip.set_audio(self.audio_handle)
+        clip.write_videofile(str(self.output_dir / self.video_name))
+        if self.clean_work_dir:
+            shutil.rmtree(self.output_img_dir, ignore_errors=True)

src/DataManager/base.py ADDED Viewed

	@@ -0,0 +1,16 @@

+from abc import ABC, abstractmethod
+import numpy as np
+class BaseDataManager(ABC):
+    @abstractmethod
+    def __len__(self) -> int:
+        pass
+    @abstractmethod
+    def get(self) -> np.ndarray:
+        pass
+    @abstractmethod
+    def save(self, img: np.ndarray) -> None:
+        pass

src/DataManager/utils.py ADDED Viewed

	@@ -0,0 +1,12 @@

+import cv2
+import numpy as np
+from pathlib import Path
+from typing import Union
+def imread_rgb(img_path: Union[str, Path]) -> np.ndarray:
+    return cv2.cvtColor(cv2.imread(str(img_path)), cv2.COLOR_BGR2RGB)
+def imwrite_rgb(img_path: Union[str, Path], img):
+    return cv2.imwrite(str(img_path), cv2.cvtColor(img, cv2.COLOR_RGB2BGR))

src/FaceAlign/face_align.py ADDED Viewed

	@@ -0,0 +1,244 @@

+import cv2
+import numpy as np
+import torch
+from skimage import transform as skt
+from typing import Iterable, Tuple
+src1 = np.array(
+    [
+        [51.642, 50.115],
+        [57.617, 49.990],
+        [35.740, 69.007],
+        [51.157, 89.050],
+        [57.025, 89.702],
+    ],
+    dtype=np.float32,
+)
+# <--left
+src2 = np.array(
+    [
+        [45.031, 50.118],
+        [65.568, 50.872],
+        [39.677, 68.111],
+        [45.177, 86.190],
+        [64.246, 86.758],
+    ],
+    dtype=np.float32,
+)
+# ---frontal
+src3 = np.array(
+    [
+        [39.730, 51.138],
+        [72.270, 51.138],
+        [56.000, 68.493],
+        [42.463, 87.010],
+        [69.537, 87.010],
+    ],
+    dtype=np.float32,
+)
+# -->right
+src4 = np.array(
+    [
+        [46.845, 50.872],
+        [67.382, 50.118],
+        [72.737, 68.111],
+        [48.167, 86.758],
+        [67.236, 86.190],
+    ],
+    dtype=np.float32,
+)
+# -->right profile
+src5 = np.array(
+    [
+        [54.796, 49.990],
+        [60.771, 50.115],
+        [76.673, 69.007],
+        [55.388, 89.702],
+        [61.257, 89.050],
+    ],
+    dtype=np.float32,
+)
+src = np.array([src1, src2, src3, src4, src5])
+src_map = src
+ffhq_src = np.array(
+    [
+        [192.98138, 239.94708],
+        [318.90277, 240.1936],
+        [256.63416, 314.01935],
+        [201.26117, 371.41043],
+        [313.08905, 371.15118],
+    ]
+)
+ffhq_src = np.expand_dims(ffhq_src, axis=0)
+# arcface_src = np.array(
+#     [[38.2946, 51.6963], [73.5318, 51.5014], [56.0252, 71.7366],
+#      [41.5493, 92.3655], [70.7299, 92.2041]],
+#     dtype=np.float32)
+# arcface_src = np.expand_dims(arcface_src, axis=0)
+# In[66]:
+# lmk is prediction; src is template
+def estimate_norm(lmk, image_size=112, mode="ffhq"):
+    assert lmk.shape == (5, 2)
+    tform = skt.SimilarityTransform()
+    lmk_tran = np.insert(lmk, 2, values=np.ones(5), axis=1)
+    min_M = []
+    min_index = []
+    min_error = float("inf")
+    if mode == "ffhq":
+        # assert image_size == 112
+        src = ffhq_src * image_size / 512
+    else:
+        src = src_map * image_size / 112
+    for i in np.arange(src.shape[0]):
+        tform.estimate(lmk, src[i])
+        M = tform.params[0:2, :]
+        results = np.dot(M, lmk_tran.T)
+        results = results.T
+        error = np.sum(np.sqrt(np.sum((results - src[i]) ** 2, axis=1)))
+        if error < min_error:
+            min_error = error
+            min_M = M
+            min_index = i
+    return min_M, min_index
+def norm_crop(img, landmark, image_size=112, mode="ffhq"):
+    if mode == "Both":
+        M_None, _ = estimate_norm(landmark, image_size, mode="newarc")
+        M_ffhq, _ = estimate_norm(landmark, image_size, mode="ffhq")
+        warped_None = cv2.warpAffine(
+            img, M_None, (image_size, image_size), borderValue=0.0
+        )
+        warped_ffhq = cv2.warpAffine(
+            img, M_ffhq, (image_size, image_size), borderValue=0.0
+        )
+        return warped_ffhq, warped_None
+    else:
+        M, pose_index = estimate_norm(landmark, image_size, mode)
+        warped = cv2.warpAffine(img, M, (image_size, image_size), borderValue=0.0)
+        return warped
+def square_crop(im, S):
+    if im.shape[0] > im.shape[1]:
+        height = S
+        width = int(float(im.shape[1]) / im.shape[0] * S)
+        scale = float(S) / im.shape[0]
+    else:
+        width = S
+        height = int(float(im.shape[0]) / im.shape[1] * S)
+        scale = float(S) / im.shape[1]
+    resized_im = cv2.resize(im, (width, height))
+    det_im = np.zeros((S, S, 3), dtype=np.uint8)
+    det_im[: resized_im.shape[0], : resized_im.shape[1], :] = resized_im
+    return det_im, scale
+def transform(data, center, output_size, scale, rotation):
+    scale_ratio = scale
+    rot = float(rotation) * np.pi / 180.0
+    # translation = (output_size/2-center[0]*scale_ratio, output_size/2-center[1]*scale_ratio)
+    t1 = skt.SimilarityTransform(scale=scale_ratio)
+    cx = center[0] * scale_ratio
+    cy = center[1] * scale_ratio
+    t2 = skt.SimilarityTransform(translation=(-1 * cx, -1 * cy))
+    t3 = skt.SimilarityTransform(rotation=rot)
+    t4 = skt.SimilarityTransform(translation=(output_size / 2, output_size / 2))
+    t = t1 + t2 + t3 + t4
+    M = t.params[0:2]
+    cropped = cv2.warpAffine(data, M, (output_size, output_size), borderValue=0.0)
+    return cropped, M
+def trans_points2d(pts, M):
+    new_pts = np.zeros(shape=pts.shape, dtype=np.float32)
+    for i in range(pts.shape[0]):
+        pt = pts[i]
+        new_pt = np.array([pt[0], pt[1], 1.0], dtype=np.float32)
+        new_pt = np.dot(M, new_pt)
+        # print('new_pt', new_pt.shape, new_pt)
+        new_pts[i] = new_pt[0:2]
+    return new_pts
+def trans_points3d(pts, M):
+    scale = np.sqrt(M[0][0] * M[0][0] + M[0][1] * M[0][1])
+    # print(scale)
+    new_pts = np.zeros(shape=pts.shape, dtype=np.float32)
+    for i in range(pts.shape[0]):
+        pt = pts[i]
+        new_pt = np.array([pt[0], pt[1], 1.0], dtype=np.float32)
+        new_pt = np.dot(M, new_pt)
+        # print('new_pt', new_pt.shape, new_pt)
+        new_pts[i][0:2] = new_pt[0:2]
+        new_pts[i][2] = pts[i][2] * scale
+    return new_pts
+def trans_points(pts, M):
+    if pts.shape[1] == 2:
+        return trans_points2d(pts, M)
+    else:
+        return trans_points3d(pts, M)
+def inverse_transform(mat: np.ndarray) -> np.ndarray:
+    # inverse the Affine transformation matrix
+    inv_mat = np.zeros([2, 3])
+    div1 = mat[0][0] * mat[1][1] - mat[0][1] * mat[1][0]
+    inv_mat[0][0] = mat[1][1] / div1
+    inv_mat[0][1] = -mat[0][1] / div1
+    inv_mat[0][2] = -(mat[0][2] * mat[1][1] - mat[0][1] * mat[1][2]) / div1
+    div2 = mat[0][1] * mat[1][0] - mat[0][0] * mat[1][1]
+    inv_mat[1][0] = mat[1][0] / div2
+    inv_mat[1][1] = -mat[0][0] / div2
+    inv_mat[1][2] = -(mat[0][2] * mat[1][0] - mat[0][0] * mat[1][2]) / div2
+    return inv_mat
+def inverse_transform_batch(mat: torch.Tensor) -> torch.Tensor:
+    # inverse the Affine transformation matrix
+    inv_mat = torch.zeros_like(mat)
+    div1 = mat[:, 0, 0] * mat[:, 1, 1] - mat[:, 0, 1] * mat[:, 1, 0]
+    inv_mat[:, 0, 0] = mat[:, 1, 1] / div1
+    inv_mat[:, 0, 1] = -mat[:, 0, 1] / div1
+    inv_mat[:, 0, 2] = (
+        -(mat[:, 0, 2] * mat[:, 1, 1] - mat[:, 0, 1] * mat[:, 1, 2]) / div1
+    )
+    div2 = mat[:, 0, 1] * mat[:, 1, 0] - mat[:, 0, 0] * mat[:, 1, 1]
+    inv_mat[:, 1, 0] = mat[:, 1, 0] / div2
+    inv_mat[:, 1, 1] = -mat[:, 0, 0] / div2
+    inv_mat[:, 1, 2] = (
+        -(mat[:, 0, 2] * mat[:, 1, 0] - mat[:, 0, 0] * mat[:, 1, 2]) / div2
+    )
+    return inv_mat
+def align_face(
+    img: np.ndarray, key_points: np.ndarray, crop_size: int, mode: str = "ffhq"
+) -> Tuple[Iterable[np.ndarray], Iterable[np.ndarray]]:
+    align_imgs = []
+    transforms = []
+    for i in range(key_points.shape[0]):
+        kps = key_points[i]
+        transform_matrix, _ = estimate_norm(kps, crop_size, mode=mode)
+        align_img = cv2.warpAffine(
+            img, transform_matrix, (crop_size, crop_size), borderValue=0.0
+        )
+        align_imgs.append(align_img)
+        transforms.append(transform_matrix)
+    return align_imgs, transforms

src/FaceDetector/face_detector.py ADDED Viewed

	@@ -0,0 +1,37 @@

+from typing import NamedTuple, Optional, Tuple
+from insightface.model_zoo import model_zoo
+import numpy as np
+from pathlib import Path
+class Detection(NamedTuple):
+    bbox: Optional[np.ndarray]
+    score: Optional[np.ndarray]
+    key_points: Optional[np.ndarray]
+class FaceDetector:
+    def __init__(
+        self,
+        model_path: Path,
+        det_thresh: float = 0.5,
+        det_size: Tuple[int, int] = (640, 640),
+        mode: str = "None",
+        device: str = "cpu",
+    ):
+        self.det_thresh = det_thresh
+        self.mode = mode
+        self.device = device
+        self.handler = model_zoo.get_model(str(model_path))
+        ctx_id = -1 if device == "cpu" else 0
+        self.handler.prepare(ctx_id, input_size=det_size)
+    def __call__(self, img: np.ndarray, max_num: int = 0) -> Detection:
+        bboxes, kpss = self.handler.detect(
+            img, threshold=self.det_thresh, max_num=max_num, metric="default"
+        )
+        if bboxes.shape[0] == 0:
+            return Detection(None, None, None)
+        return Detection(bboxes[..., :-1], bboxes[..., -1], kpss)

src/FaceId/faceid.py ADDED Viewed

	@@ -0,0 +1,50 @@

+import numpy as np
+import torch
+import torch.nn.functional as F
+from torchvision import transforms
+from typing import Iterable, Union
+from pathlib import Path
+class FaceId(torch.nn.Module):
+    def __init__(
+        self, model_path: Path, device: str, input_shape: Iterable[int] = (112, 112)
+    ):
+        super().__init__()
+        self.input_shape = input_shape
+        self.net = torch.load(model_path, map_location=torch.device("cpu"))
+        self.net.eval()
+        self.transform = transforms.Compose(
+            [
+                transforms.ToTensor(),
+                transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
+            ]
+        )
+        for n, p in self.net.named_parameters():
+            assert (
+                not p.requires_grad
+            ), f"Parameter {n}: requires_grad: {p.requires_grad}"
+        self.device = torch.device(device)
+        self.to(self.device)
+    def forward(
+        self, img_id: Union[np.ndarray, Iterable[np.ndarray]], normalize: bool = True
+    ) -> torch.Tensor:
+        if isinstance(img_id, Iterable):
+            img_id = [self.transform(x) for x in img_id]
+            img_id = torch.stack(img_id, dim=0)
+        else:
+            img_id = self.transform(img_id)
+            img_id = img_id.unsqueeze(0)
+        img_id = img_id.to(self.device)
+        img_id_112 = F.interpolate(img_id, size=self.input_shape)
+        latent_id = self.net(img_id_112)
+        return F.normalize(latent_id, p=2, dim=1) if normalize else latent_id

src/Generator/fs_networks_512.py ADDED Viewed

	@@ -0,0 +1,277 @@

+"""
+Author: Naiyuan liu
+Github: https://github.com/NNNNAI
+Date: 2021-11-23 16:55:48
+LastEditors: Naiyuan liu
+LastEditTime: 2021-11-24 16:58:06
+Description:
+"""
+"""
+Copyright (C) 2019 NVIDIA Corporation.  All rights reserved.
+Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
+"""
+import torch
+import torch.nn as nn
+class InstanceNorm(nn.Module):
+    def __init__(self, epsilon=1e-8):
+        """
+        @notice: avoid in-place ops.
+        https://discuss.pytorch.org/t/encounter-the-runtimeerror-one-of-the-variables-needed-for-gradient-computation-has-been-modified-by-an-inplace-operation/836/3
+        """
+        super(InstanceNorm, self).__init__()
+        self.epsilon = epsilon
+    def forward(self, x):
+        x = x - torch.mean(x, (2, 3), True)
+        tmp = torch.mul(x, x)  # or x ** 2
+        tmp = torch.rsqrt(torch.mean(tmp, (2, 3), True) + self.epsilon)
+        return x * tmp
+class ApplyStyle(nn.Module):
+    """
+    @ref: https://github.com/lernapparat/lernapparat/blob/master/style_gan/pytorch_style_gan.ipynb
+    """
+    def __init__(self, latent_size, channels):
+        super(ApplyStyle, self).__init__()
+        self.linear = nn.Linear(latent_size, channels * 2)
+    def forward(self, x, latent):
+        style = self.linear(latent)  # style => [batch_size, n_channels*2]
+        shape = [-1, 2, x.size(1), 1, 1]
+        style = style.view(shape)  # [batch_size, 2, n_channels, ...]
+        # x = x * (style[:, 0] + 1.) + style[:, 1]
+        x = x * (style[:, 0] * 1 + 1.0) + style[:, 1] * 1
+        return x
+class ResnetBlock_Adain(nn.Module):
+    def __init__(self, dim, latent_size, padding_type, activation=nn.ReLU(True)):
+        super(ResnetBlock_Adain, self).__init__()
+        p = 0
+        conv1 = []
+        if padding_type == "reflect":
+            conv1 += [nn.ReflectionPad2d(1)]
+        elif padding_type == "replicate":
+            conv1 += [nn.ReplicationPad2d(1)]
+        elif padding_type == "zero":
+            p = 1
+        else:
+            raise NotImplementedError("padding [%s] is not implemented" % padding_type)
+        conv1 += [nn.Conv2d(dim, dim, kernel_size=3, padding=p), InstanceNorm()]
+        self.conv1 = nn.Sequential(*conv1)
+        self.style1 = ApplyStyle(latent_size, dim)
+        self.act1 = activation
+        p = 0
+        conv2 = []
+        if padding_type == "reflect":
+            conv2 += [nn.ReflectionPad2d(1)]
+        elif padding_type == "replicate":
+            conv2 += [nn.ReplicationPad2d(1)]
+        elif padding_type == "zero":
+            p = 1
+        else:
+            raise NotImplementedError("padding [%s] is not implemented" % padding_type)
+        conv2 += [nn.Conv2d(dim, dim, kernel_size=3, padding=p), InstanceNorm()]
+        self.conv2 = nn.Sequential(*conv2)
+        self.style2 = ApplyStyle(latent_size, dim)
+    def forward(self, x, dlatents_in_slice):
+        y = self.conv1(x)
+        y = self.style1(y, dlatents_in_slice)
+        y = self.act1(y)
+        y = self.conv2(y)
+        y = self.style2(y, dlatents_in_slice)
+        out = x + y
+        return out
+class Generator_Adain_Upsample(nn.Module):
+    def __init__(
+        self,
+        input_nc,
+        output_nc,
+        latent_size,
+        n_blocks=6,
+        deep=False,
+        norm_layer=nn.BatchNorm2d,
+        padding_type="reflect",
+    ):
+        assert n_blocks >= 0
+        super(Generator_Adain_Upsample, self).__init__()
+        activation = nn.ReLU(True)
+        self.deep = deep
+        self.first_layer = nn.Sequential(
+            nn.ReflectionPad2d(3),
+            nn.Conv2d(input_nc, 32, kernel_size=7, padding=0),
+            norm_layer(32),
+            activation,
+        )
+        # downsample
+        self.down0 = nn.Sequential(
+            nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1),
+            norm_layer(64),
+            activation,
+        )
+        self.down1 = nn.Sequential(
+            nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1),
+            norm_layer(128),
+            activation,
+        )
+        self.down2 = nn.Sequential(
+            nn.Conv2d(128, 256, kernel_size=3, stride=2, padding=1),
+            norm_layer(256),
+            activation,
+        )
+        self.down3 = nn.Sequential(
+            nn.Conv2d(256, 512, kernel_size=3, stride=2, padding=1),
+            norm_layer(512),
+            activation,
+        )
+        if self.deep:
+            self.down4 = nn.Sequential(
+                nn.Conv2d(512, 512, kernel_size=3, stride=2, padding=1),
+                norm_layer(512),
+                activation,
+            )
+        # resnet blocks
+        BN = []
+        for i in range(n_blocks):
+            BN += [
+                ResnetBlock_Adain(
+                    512,
+                    latent_size=latent_size,
+                    padding_type=padding_type,
+                    activation=activation,
+                )
+            ]
+        self.BottleNeck = nn.Sequential(*BN)
+        if self.deep:
+            self.up4 = nn.Sequential(
+                nn.Upsample(scale_factor=2, mode="bilinear"),
+                nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),
+                nn.BatchNorm2d(512),
+                activation,
+            )
+        self.up3 = nn.Sequential(
+            nn.Upsample(scale_factor=2, mode="bilinear"),
+            nn.Conv2d(512, 256, kernel_size=3, stride=1, padding=1),
+            nn.BatchNorm2d(256),
+            activation,
+        )
+        self.up2 = nn.Sequential(
+            nn.Upsample(scale_factor=2, mode="bilinear"),
+            nn.Conv2d(256, 128, kernel_size=3, stride=1, padding=1),
+            nn.BatchNorm2d(128),
+            activation,
+        )
+        self.up1 = nn.Sequential(
+            nn.Upsample(scale_factor=2, mode="bilinear"),
+            nn.Conv2d(128, 64, kernel_size=3, stride=1, padding=1),
+            nn.BatchNorm2d(64),
+            activation,
+        )
+        self.up0 = nn.Sequential(
+            nn.Upsample(scale_factor=2, mode="bilinear"),
+            nn.Conv2d(64, 32, kernel_size=3, stride=1, padding=1),
+            nn.BatchNorm2d(32),
+            activation,
+        )
+        self.last_layer = nn.Sequential(
+            nn.ReflectionPad2d(3),
+            nn.Conv2d(32, output_nc, kernel_size=7, padding=0),
+            nn.Tanh(),
+        )
+    def forward(self, input, dlatents):
+        x = input  # 3*224*224
+        skip0 = self.first_layer(x)
+        skip1 = self.down0(skip0)
+        skip2 = self.down1(skip1)
+        skip3 = self.down2(skip2)
+        if self.deep:
+            skip4 = self.down3(skip3)
+            x = self.down4(skip4)
+        else:
+            x = self.down3(skip3)
+        for i in range(len(self.BottleNeck)):
+            x = self.BottleNeck[i](x, dlatents)
+        if self.deep:
+            x = self.up4(x)
+        x = self.up3(x)
+        x = self.up2(x)
+        x = self.up1(x)
+        x = self.up0(x)
+        x = self.last_layer(x)
+        x = (x + 1) / 2
+        return x
+class Discriminator(nn.Module):
+    def __init__(self, input_nc, norm_layer=nn.BatchNorm2d, use_sigmoid=False):
+        super(Discriminator, self).__init__()
+        kw = 4
+        padw = 1
+        self.down1 = nn.Sequential(
+            nn.Conv2d(input_nc, 64, kernel_size=kw, stride=2, padding=padw),
+            nn.LeakyReLU(0.2, True),
+        )
+        self.down2 = nn.Sequential(
+            nn.Conv2d(64, 128, kernel_size=kw, stride=2, padding=padw),
+            norm_layer(128),
+            nn.LeakyReLU(0.2, True),
+        )
+        self.down3 = nn.Sequential(
+            nn.Conv2d(128, 256, kernel_size=kw, stride=2, padding=padw),
+            norm_layer(256),
+            nn.LeakyReLU(0.2, True),
+        )
+        self.down4 = nn.Sequential(
+            nn.Conv2d(256, 512, kernel_size=kw, stride=2, padding=padw),
+            norm_layer(512),
+            nn.LeakyReLU(0.2, True),
+        )
+        self.conv1 = nn.Sequential(
+            nn.Conv2d(512, 512, kernel_size=kw, stride=1, padding=padw),
+            norm_layer(512),
+            nn.LeakyReLU(0.2, True),
+        )
+        if use_sigmoid:
+            self.conv2 = nn.Sequential(
+                nn.Conv2d(512, 1, kernel_size=kw, stride=1, padding=padw), nn.Sigmoid()
+            )
+        else:
+            self.conv2 = nn.Sequential(
+                nn.Conv2d(512, 1, kernel_size=kw, stride=1, padding=padw)
+            )
+    def forward(self, input):
+        out = []
+        x = self.down1(input)
+        out.append(x)
+        x = self.down2(x)
+        out.append(x)
+        x = self.down3(x)
+        out.append(x)
+        x = self.down4(x)
+        out.append(x)
+        x = self.conv1(x)
+        out.append(x)
+        x = self.conv2(x)
+        out.append(x)
+        return out

src/Generator/fs_networks_fix.py ADDED Viewed

	@@ -0,0 +1,245 @@

+"""
+Copyright (C) 2019 NVIDIA Corporation.  All rights reserved.
+Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
+"""
+import torch
+import torch.nn as nn
+from torchvision import transforms
+from typing import Iterable
+import numpy as np
+class InstanceNorm(nn.Module):
+    def __init__(self, epsilon=1e-8):
+        """
+        @notice: avoid in-place ops.
+        https://discuss.pytorch.org/t/encounter-the-runtimeerror-one-of-the-variables-needed-for-gradient-computation-has-been-modified-by-an-inplace-operation/836/3
+        """
+        super(InstanceNorm, self).__init__()
+        self.epsilon = epsilon
+    def forward(self, x):
+        x = x - torch.mean(x, (2, 3), True)
+        tmp = torch.mul(x, x)  # or x ** 2
+        tmp = torch.rsqrt(torch.mean(tmp, (2, 3), True) + self.epsilon)
+        return x * tmp
+class ApplyStyle(nn.Module):
+    """
+    @ref: https://github.com/lernapparat/lernapparat/blob/master/style_gan/pytorch_style_gan.ipynb
+    """
+    def __init__(self, latent_size, channels):
+        super(ApplyStyle, self).__init__()
+        self.linear = nn.Linear(latent_size, channels * 2)
+    def forward(self, x, latent):
+        style = self.linear(latent)  # style => [batch_size, n_channels*2]
+        shape = [-1, 2, x.size(1), 1, 1]
+        style = style.view(shape)  # [batch_size, 2, n_channels, ...]
+        # x = x * (style[:, 0] + 1.) + style[:, 1]
+        x = x * (style[:, 0] * 1 + 1.0) + style[:, 1] * 1
+        return x
+class ResnetBlock_Adain(nn.Module):
+    def __init__(self, dim, latent_size, padding_type, activation=nn.ReLU(True)):
+        super(ResnetBlock_Adain, self).__init__()
+        p = 0
+        conv1 = []
+        if padding_type == "reflect":
+            conv1 += [nn.ReflectionPad2d(1)]
+        elif padding_type == "replicate":
+            conv1 += [nn.ReplicationPad2d(1)]
+        elif padding_type == "zero":
+            p = 1
+        else:
+            raise NotImplementedError("padding [%s] is not implemented" % padding_type)
+        conv1 += [nn.Conv2d(dim, dim, kernel_size=3, padding=p), InstanceNorm()]
+        self.conv1 = nn.Sequential(*conv1)
+        self.style1 = ApplyStyle(latent_size, dim)
+        self.act1 = activation
+        p = 0
+        conv2 = []
+        if padding_type == "reflect":
+            conv2 += [nn.ReflectionPad2d(1)]
+        elif padding_type == "replicate":
+            conv2 += [nn.ReplicationPad2d(1)]
+        elif padding_type == "zero":
+            p = 1
+        else:
+            raise NotImplementedError("padding [%s] is not implemented" % padding_type)
+        conv2 += [nn.Conv2d(dim, dim, kernel_size=3, padding=p), InstanceNorm()]
+        self.conv2 = nn.Sequential(*conv2)
+        self.style2 = ApplyStyle(latent_size, dim)
+    def forward(self, x, dlatents_in_slice):
+        y = self.conv1(x)
+        y = self.style1(y, dlatents_in_slice)
+        y = self.act1(y)
+        y = self.conv2(y)
+        y = self.style2(y, dlatents_in_slice)
+        out = x + y
+        return out
+class Generator_Adain_Upsample(nn.Module):
+    def __init__(
+        self,
+        input_nc: int,
+        output_nc: int,
+        latent_size: int,
+        n_blocks: int = 6,
+        deep: bool = False,
+        use_last_act: bool = True,
+        norm_layer: torch.nn.Module = nn.BatchNorm2d,
+        padding_type: str = "reflect",
+    ):
+        assert n_blocks >= 0
+        super(Generator_Adain_Upsample, self).__init__()
+        activation = nn.ReLU(True)
+        self.deep = deep
+        self.use_last_act = use_last_act
+        self.to_tensor_normalize = transforms.Compose(
+            [
+                transforms.ToTensor(),
+                transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
+            ]
+        )
+        self.to_tensor = transforms.Compose([transforms.ToTensor()])
+        self.imagenet_mean = torch.tensor([0.485, 0.456, 0.406]).view(1, 3, 1, 1)
+        self.imagenet_std = torch.tensor([0.229, 0.224, 0.225]).view(1, 3, 1, 1)
+        self.first_layer = nn.Sequential(
+            nn.ReflectionPad2d(3),
+            nn.Conv2d(input_nc, 64, kernel_size=7, padding=0),
+            norm_layer(64),
+            activation,
+        )
+        # downsample
+        self.down1 = nn.Sequential(
+            nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1),
+            norm_layer(128),
+            activation,
+        )
+        self.down2 = nn.Sequential(
+            nn.Conv2d(128, 256, kernel_size=3, stride=2, padding=1),
+            norm_layer(256),
+            activation,
+        )
+        self.down3 = nn.Sequential(
+            nn.Conv2d(256, 512, kernel_size=3, stride=2, padding=1),
+            norm_layer(512),
+            activation,
+        )
+        if self.deep:
+            self.down4 = nn.Sequential(
+                nn.Conv2d(512, 512, kernel_size=3, stride=2, padding=1),
+                norm_layer(512),
+                activation,
+            )
+        # resnet blocks
+        BN = []
+        for i in range(n_blocks):
+            BN += [
+                ResnetBlock_Adain(
+                    512,
+                    latent_size=latent_size,
+                    padding_type=padding_type,
+                    activation=activation,
+                )
+            ]
+        self.BottleNeck = nn.Sequential(*BN)
+        if self.deep:
+            self.up4 = nn.Sequential(
+                nn.Upsample(scale_factor=2, mode="bilinear", align_corners=False),
+                nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),
+                nn.BatchNorm2d(512),
+                activation,
+            )
+        self.up3 = nn.Sequential(
+            nn.Upsample(scale_factor=2, mode="bilinear", align_corners=False),
+            nn.Conv2d(512, 256, kernel_size=3, stride=1, padding=1),
+            nn.BatchNorm2d(256),
+            activation,
+        )
+        self.up2 = nn.Sequential(
+            nn.Upsample(scale_factor=2, mode="bilinear", align_corners=False),
+            nn.Conv2d(256, 128, kernel_size=3, stride=1, padding=1),
+            nn.BatchNorm2d(128),
+            activation,
+        )
+        self.up1 = nn.Sequential(
+            nn.Upsample(scale_factor=2, mode="bilinear", align_corners=False),
+            nn.Conv2d(128, 64, kernel_size=3, stride=1, padding=1),
+            nn.BatchNorm2d(64),
+            activation,
+        )
+        if self.use_last_act:
+            self.last_layer = nn.Sequential(
+                nn.ReflectionPad2d(3),
+                nn.Conv2d(64, output_nc, kernel_size=7, padding=0),
+                torch.nn.Tanh(),
+            )
+        else:
+            self.last_layer = nn.Sequential(
+                nn.ReflectionPad2d(3),
+                nn.Conv2d(64, output_nc, kernel_size=7, padding=0),
+            )
+    def to(self, device):
+        super().to(device)
+        self.device = device
+        self.imagenet_mean = self.imagenet_mean.to(device)
+        self.imagenet_std = self.imagenet_std.to(device)
+        return self
+    def forward(self, x: Iterable[np.ndarray], dlatents: torch.Tensor):
+        if self.use_last_act:
+            x = [self.to_tensor(_) for _ in x]
+        else:
+            x = [self.to_tensor_normalize(_) for _ in x]
+        x = torch.stack(x, dim=0)
+        x = x.to(self.device)
+        skip1 = self.first_layer(x)
+        skip2 = self.down1(skip1)
+        skip3 = self.down2(skip2)
+        if self.deep:
+            skip4 = self.down3(skip3)
+            x = self.down4(skip4)
+        else:
+            x = self.down3(skip3)
+        for i in range(len(self.BottleNeck)):
+            x = self.BottleNeck[i](x, dlatents)
+        if self.deep:
+            x = self.up4(x)
+        x = self.up3(x)
+        x = self.up2(x)
+        x = self.up1(x)
+        x = self.last_layer(x)
+        if self.use_last_act:
+            x = (x + 1) / 2
+        else:
+            x = x * self.imagenet_std + self.imagenet_mean
+        return x

src/Misc/types.py ADDED Viewed

	@@ -0,0 +1,11 @@

+from enum import Enum
+class CheckpointType(Enum):
+    OFFICIAL_224 = "official_224"
+    UNOFFICIAL = "none"
+class FaceAlignmentType(Enum):
+    FFHQ = "ffhq"
+    DEFAULT = "none"

src/Misc/utils.py ADDED Viewed

	@@ -0,0 +1,28 @@

+import torch
+import numpy as np
+import cv2
+def tensor2img_denorm(tensor):
+    std = torch.tensor([0.229, 0.224, 0.225]).view(1, 3, 1, 1)
+    mean = torch.tensor([0.485, 0.456, 0.406]).view(1, 3, 1, 1)
+    tensor = std * tensor.detach().cpu() + mean
+    img = tensor.numpy()
+    img = img.transpose(0, 2, 3, 1)[0]
+    img = np.clip(img * 255, 0.0, 255.0).astype(np.uint8)
+    return img
+def tensor2img(tensor):
+    tensor = tensor.detach().cpu().numpy()
+    img = tensor.transpose(0, 2, 3, 1)[0]
+    img = np.clip(img * 255, 0.0, 255.0).astype(np.uint8)
+    return img
+def show_tensor(tensor, name):
+    img = cv2.cvtColor(tensor2img(tensor), cv2.COLOR_RGB2BGR)
+    cv2.namedWindow(name, cv2.WINDOW_NORMAL)
+    cv2.imshow(name, img)
+    cv2.waitKey()

src/PostProcess/GFPGAN/gfpgan.py ADDED Viewed

	@@ -0,0 +1,341 @@

+import math
+import random
+import torch
+import torch.nn.functional as F
+from src.PostProcess.GFPGAN.stylegan2 import StyleGAN2GeneratorClean
+class StyleGAN2GeneratorCSFT(StyleGAN2GeneratorClean):
+    """StyleGAN2 Generator with SFT modulation (Spatial Feature Transform).
+    It is the clean version without custom compiled CUDA extensions used in StyleGAN2.
+    Args:
+        out_size (int): The spatial size of outputs.
+        num_style_feat (int): Channel number of style features. Default: 512.
+        num_mlp (int): Layer number of MLP style layers. Default: 8.
+        channel_multiplier (int): Channel multiplier for large networks of StyleGAN2. Default: 2.
+        narrow (float): The narrow ratio for channels. Default: 1.
+        sft_half (bool): Whether to apply SFT on half of the input channels. Default: False.
+    """
+    def __init__(self, out_size, num_style_feat=512, num_mlp=8, channel_multiplier=2, narrow=1, sft_half=False):
+        super(StyleGAN2GeneratorCSFT, self).__init__(
+            out_size,
+            num_style_feat=num_style_feat,
+            num_mlp=num_mlp,
+            channel_multiplier=channel_multiplier,
+            narrow=narrow)
+        self.sft_half = sft_half
+    def forward(self,
+                styles,
+                conditions,
+                input_is_latent=False,
+                noise=None,
+                randomize_noise=True,
+                truncation=1,
+                truncation_latent=None,
+                inject_index=None,
+                return_latents=False):
+        """Forward function for StyleGAN2GeneratorCSFT.
+        Args:
+            styles (list[Tensor]): Sample codes of styles.
+            conditions (list[Tensor]): SFT conditions to generators.
+            input_is_latent (bool): Whether input is latent style. Default: False.
+            noise (Tensor | None): Input noise or None. Default: None.
+            randomize_noise (bool): Randomize noise, used when 'noise' is False. Default: True.
+            truncation (float): The truncation ratio. Default: 1.
+            truncation_latent (Tensor | None): The truncation latent tensor. Default: None.
+            inject_index (int | None): The injection index for mixing noise. Default: None.
+            return_latents (bool): Whether to return style latents. Default: False.
+        """
+        # style codes -> latents with Style MLP layer
+        if not input_is_latent:
+            styles = [self.style_mlp(s) for s in styles]
+        # noises
+        if noise is None:
+            if randomize_noise:
+                noise = [None] * self.num_layers  # for each style conv layer
+            else:  # use the stored noise
+                noise = [getattr(self.noises, f'noise{i}') for i in range(self.num_layers)]
+        # style truncation
+        if truncation < 1:
+            style_truncation = []
+            for style in styles:
+                style_truncation.append(truncation_latent + truncation * (style - truncation_latent))
+            styles = style_truncation
+        # get style latents with injection
+        if len(styles) == 1:
+            inject_index = self.num_latent
+            if styles[0].ndim < 3:
+                # repeat latent code for all the layers
+                latent = styles[0].unsqueeze(1).repeat(1, inject_index, 1)
+            else:  # used for encoder with different latent code for each layer
+                latent = styles[0]
+        elif len(styles) == 2:  # mixing noises
+            if inject_index is None:
+                inject_index = random.randint(1, self.num_latent - 1)
+            latent1 = styles[0].unsqueeze(1).repeat(1, inject_index, 1)
+            latent2 = styles[1].unsqueeze(1).repeat(1, self.num_latent - inject_index, 1)
+            latent = torch.cat([latent1, latent2], 1)
+        # main generation
+        out = self.constant_input(latent.shape[0])
+        out = self.style_conv1(out, latent[:, 0], noise=noise[0])
+        skip = self.to_rgb1(out, latent[:, 1])
+        i = 1
+        for conv1, conv2, noise1, noise2, to_rgb in zip(self.style_convs[::2], self.style_convs[1::2], noise[1::2],
+                                                        noise[2::2], self.to_rgbs):
+            out = conv1(out, latent[:, i], noise=noise1)
+            # the conditions may have fewer levels
+            if i < len(conditions):
+                # SFT part to combine the conditions
+                if self.sft_half:  # only apply SFT to half of the channels
+                    out_same, out_sft = torch.split(out, int(out.size(1) // 2), dim=1)
+                    out_sft = out_sft * conditions[i - 1] + conditions[i]
+                    out = torch.cat([out_same, out_sft], dim=1)
+                else:  # apply SFT to all the channels
+                    out = out * conditions[i - 1] + conditions[i]
+            out = conv2(out, latent[:, i + 1], noise=noise2)
+            skip = to_rgb(out, latent[:, i + 2], skip)  # feature back to the rgb space
+            i += 2
+        image = skip
+        if return_latents:
+            return image, latent
+        else:
+            return image, None
+class ResBlock(torch.nn.Module):
+    """Residual block with bilinear upsampling/downsampling.
+    Args:
+        in_channels (int): Channel number of the input.
+        out_channels (int): Channel number of the output.
+        mode (str): Upsampling/downsampling mode. Options: down | up. Default: down.
+    """
+    def __init__(self, in_channels, out_channels, mode='down'):
+        super(ResBlock, self).__init__()
+        self.conv1 = torch.nn.Conv2d(in_channels, in_channels, 3, 1, 1)
+        self.conv2 = torch.nn.Conv2d(in_channels, out_channels, 3, 1, 1)
+        self.skip = torch.nn.Conv2d(in_channels, out_channels, 1, bias=False)
+        if mode == 'down':
+            self.scale_factor = 0.5
+        elif mode == 'up':
+            self.scale_factor = 2
+    def forward(self, x):
+        out = F.leaky_relu_(self.conv1(x), negative_slope=0.2)
+        # upsample/downsample
+        out = F.interpolate(out, scale_factor=self.scale_factor, mode='bilinear', align_corners=False)
+        out = F.leaky_relu_(self.conv2(out), negative_slope=0.2)
+        # skip
+        x = F.interpolate(x, scale_factor=self.scale_factor, mode='bilinear', align_corners=False)
+        skip = self.skip(x)
+        out = out + skip
+        return out
+class GFPGANv1Clean(torch.nn.Module):
+    """The GFPGAN architecture: Unet + StyleGAN2 decoder with SFT.
+    It is the clean version without custom compiled CUDA extensions used in StyleGAN2.
+    Ref: GFP-GAN: Towards Real-World Blind Face Restoration with Generative Facial Prior.
+    Args:
+        out_size (int): The spatial size of outputs.
+        num_style_feat (int): Channel number of style features. Default: 512.
+        channel_multiplier (int): Channel multiplier for large networks of StyleGAN2. Default: 2.
+        decoder_load_path (str): The path to the pre-trained decoder model (usually, the StyleGAN2). Default: None.
+        fix_decoder (bool): Whether to fix the decoder. Default: True.
+        num_mlp (int): Layer number of MLP style layers. Default: 8.
+        input_is_latent (bool): Whether input is latent style. Default: False.
+        different_w (bool): Whether to use different latent w for different layers. Default: False.
+        narrow (float): The narrow ratio for channels. Default: 1.
+        sft_half (bool): Whether to apply SFT on half of the input channels. Default: False.
+    """
+    def __init__(
+            self,
+            out_size,
+            num_style_feat=512,
+            channel_multiplier=1,
+            decoder_load_path=None,
+            fix_decoder=True,
+            # for stylegan decoder
+            num_mlp=8,
+            input_is_latent=False,
+            different_w=False,
+            narrow=1,
+            sft_half=False):
+        super(GFPGANv1Clean, self).__init__()
+        self.input_is_latent = input_is_latent
+        self.different_w = different_w
+        self.num_style_feat = num_style_feat
+        unet_narrow = narrow * 0.5  # by default, use a half of input channels
+        channels = {
+            '4': int(512 * unet_narrow),
+            '8': int(512 * unet_narrow),
+            '16': int(512 * unet_narrow),
+            '32': int(512 * unet_narrow),
+            '64': int(256 * channel_multiplier * unet_narrow),
+            '128': int(128 * channel_multiplier * unet_narrow),
+            '256': int(64 * channel_multiplier * unet_narrow),
+            '512': int(32 * channel_multiplier * unet_narrow),
+            '1024': int(16 * channel_multiplier * unet_narrow)
+        }
+        self.log_size = int(math.log(out_size, 2))
+        first_out_size = 2**(int(math.log(out_size, 2)))
+        self.conv_body_first = torch.nn.Conv2d(3, channels[f'{first_out_size}'], 1)
+        # downsample
+        in_channels = channels[f'{first_out_size}']
+        self.conv_body_down = torch.nn.ModuleList()
+        for i in range(self.log_size, 2, -1):
+            out_channels = channels[f'{2**(i - 1)}']
+            self.conv_body_down.append(ResBlock(in_channels, out_channels, mode='down'))
+            in_channels = out_channels
+        self.final_conv = torch.nn.Conv2d(in_channels, channels['4'], 3, 1, 1)
+        # upsample
+        in_channels = channels['4']
+        self.conv_body_up = torch.nn.ModuleList()
+        for i in range(3, self.log_size + 1):
+            out_channels = channels[f'{2**i}']
+            self.conv_body_up.append(ResBlock(in_channels, out_channels, mode='up'))
+            in_channels = out_channels
+        # to RGB
+        self.toRGB = torch.nn.ModuleList()
+        for i in range(3, self.log_size + 1):
+            self.toRGB.append(torch.nn.Conv2d(channels[f'{2**i}'], 3, 1))
+        if different_w:
+            linear_out_channel = (int(math.log(out_size, 2)) * 2 - 2) * num_style_feat
+        else:
+            linear_out_channel = num_style_feat
+        self.final_linear = torch.nn.Linear(channels['4'] * 4 * 4, linear_out_channel)
+        # the decoder: stylegan2 generator with SFT modulations
+        self.stylegan_decoder = StyleGAN2GeneratorCSFT(
+            out_size=out_size,
+            num_style_feat=num_style_feat,
+            num_mlp=num_mlp,
+            channel_multiplier=channel_multiplier,
+            narrow=narrow,
+            sft_half=sft_half)
+        # load pre-trained stylegan2 model if necessary
+        if decoder_load_path:
+            self.stylegan_decoder.load_state_dict(
+                torch.load(decoder_load_path, map_location=lambda storage, loc: storage)['params_ema'])
+        # fix decoder without updating params
+        if fix_decoder:
+            for _, param in self.stylegan_decoder.named_parameters():
+                param.requires_grad = False
+        # for SFT modulations (scale and shift)
+        self.condition_scale = torch.nn.ModuleList()
+        self.condition_shift = torch.nn.ModuleList()
+        for i in range(3, self.log_size + 1):
+            out_channels = channels[f'{2**i}']
+            if sft_half:
+                sft_out_channels = out_channels
+            else:
+                sft_out_channels = out_channels * 2
+            self.condition_scale.append(
+                torch.nn.Sequential(
+                    torch.nn.Conv2d(out_channels, out_channels, 3, 1, 1), torch.nn.LeakyReLU(0.2, True),
+                    torch.nn.Conv2d(out_channels, sft_out_channels, 3, 1, 1)))
+            self.condition_shift.append(
+                torch.nn.Sequential(
+                    torch.nn.Conv2d(out_channels, out_channels, 3, 1, 1), torch.nn.LeakyReLU(0.2, True),
+                    torch.nn.Conv2d(out_channels, sft_out_channels, 3, 1, 1)))
+    def forward(self, x, return_latents=False, return_rgb=True, randomize_noise=True, **kwargs):
+        """Forward function for GFPGANv1Clean.
+        Args:
+            x (Tensor): Input images.
+            return_latents (bool): Whether to return style latents. Default: False.
+            return_rgb (bool): Whether return intermediate rgb images. Default: True.
+            randomize_noise (bool): Randomize noise, used when 'noise' is False. Default: True.
+        """
+        conditions = []
+        unet_skips = []
+        out_rgbs = []
+        # encoder
+        feat = F.leaky_relu_(self.conv_body_first(x), negative_slope=0.2)
+        for i in range(self.log_size - 2):
+            feat = self.conv_body_down[i](feat)
+            unet_skips.insert(0, feat)
+        feat = F.leaky_relu_(self.final_conv(feat), negative_slope=0.2)
+        # style code
+        style_code = self.final_linear(feat.view(feat.size(0), -1))
+        if self.different_w:
+            style_code = style_code.view(style_code.size(0), -1, self.num_style_feat)
+        # decode
+        for i in range(self.log_size - 2):
+            # add unet skip
+            feat = feat + unet_skips[i]
+            # ResUpLayer
+            feat = self.conv_body_up[i](feat)
+            # generate scale and shift for SFT layers
+            scale = self.condition_scale[i](feat)
+            conditions.append(scale.clone())
+            shift = self.condition_shift[i](feat)
+            conditions.append(shift.clone())
+            # generate rgb images
+            if return_rgb:
+                out_rgbs.append(self.toRGB[i](feat))
+        # decoder
+        image, _ = self.stylegan_decoder([style_code],
+                                         conditions,
+                                         return_latents=return_latents,
+                                         input_is_latent=self.input_is_latent,
+                                         randomize_noise=randomize_noise)
+        return image, out_rgbs
+class GFPGANer(GFPGANv1Clean):
+    """Helper for restoration with GFPGAN."""
+    def __init__(self):
+        super().__init__(out_size=512, num_style_feat=512, channel_multiplier=2,
+                         decoder_load_path=None, fix_decoder=False, num_mlp=8, input_is_latent=True,
+                         different_w=True, narrow=1, sft_half=True)
+        self.min_max = (-1, 1)
+    @torch.no_grad()
+    def enhance(self, img, weight=0.5):
+        n, c, h, w = img.shape
+        img = F.interpolate(img, size=(512, 512), mode="bilinear")
+        img = (img - 0.5) / 0.5
+        try:
+            restored_faces = self.forward(img, return_rgb=False, weight=weight)[0]
+        except RuntimeError as error:
+            print(f'\tFailed inference for GFPGAN: {error}.')
+            restored_faces = img
+        restored_faces.clamp_(*self.min_max)
+        restored_faces = (restored_faces - self.min_max[0]) / (self.min_max[1] - self.min_max[0])
+        return F.interpolate(restored_faces, size=(h, w), mode="bilinear")

src/PostProcess/GFPGAN/stylegan2.py ADDED Viewed

	@@ -0,0 +1,351 @@

+import math
+import random
+import torch
+from torch.nn import functional as F
+class NormStyleCode(torch.nn.Module):
+    def forward(self, x):
+        """Normalize the style codes.
+        Args:
+            x (Tensor): Style codes with shape (b, c).
+        Returns:
+            Tensor: Normalized tensor.
+        """
+        return x * torch.rsqrt(torch.mean(x**2, dim=1, keepdim=True) + 1e-8)
+class ModulatedConv2d(torch.nn.Module):
+    """Modulated Conv2d used in StyleGAN2.
+    There is no bias in ModulatedConv2d.
+    Args:
+        in_channels (int): Channel number of the input.
+        out_channels (int): Channel number of the output.
+        kernel_size (int): Size of the convolving kernel.
+        num_style_feat (int): Channel number of style features.
+        demodulate (bool): Whether to demodulate in the conv layer. Default: True.
+        sample_mode (str | None): Indicating 'upsample', 'downsample' or None. Default: None.
+        eps (float): A value added to the denominator for numerical stability. Default: 1e-8.
+    """
+    def __init__(self,
+                 in_channels,
+                 out_channels,
+                 kernel_size,
+                 num_style_feat,
+                 demodulate=True,
+                 sample_mode=None,
+                 eps=1e-8):
+        super(ModulatedConv2d, self).__init__()
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.kernel_size = kernel_size
+        self.demodulate = demodulate
+        self.sample_mode = sample_mode
+        self.eps = eps
+        # modulation inside each modulated conv
+        self.modulation = torch.nn.Linear(num_style_feat, in_channels, bias=True)
+        # initialization
+        # default_init_weights(self.modulation, scale=1, bias_fill=1, a=0, mode='fan_in', nonlinearity='linear')
+        self.weight = torch.nn.Parameter(
+            torch.randn(1, out_channels, in_channels, kernel_size, kernel_size) /
+            math.sqrt(in_channels * kernel_size**2))
+        self.padding = kernel_size // 2
+    def forward(self, x, style):
+        """Forward function.
+        Args:
+            x (Tensor): Tensor with shape (b, c, h, w).
+            style (Tensor): Tensor with shape (b, num_style_feat).
+        Returns:
+            Tensor: Modulated tensor after convolution.
+        """
+        b, c, h, w = x.shape  # c = c_in
+        # weight modulation
+        style = self.modulation(style).view(b, 1, c, 1, 1)
+        # self.weight: (1, c_out, c_in, k, k); style: (b, 1, c, 1, 1)
+        weight = self.weight * style  # (b, c_out, c_in, k, k)
+        if self.demodulate:
+            demod = torch.rsqrt(weight.pow(2).sum([2, 3, 4]) + self.eps)
+            weight = weight * demod.view(b, self.out_channels, 1, 1, 1)
+        weight = weight.view(b * self.out_channels, c, self.kernel_size, self.kernel_size)
+        # upsample or downsample if necessary
+        if self.sample_mode == 'upsample':
+            x = F.interpolate(x, scale_factor=2, mode='bilinear', align_corners=False)
+        elif self.sample_mode == 'downsample':
+            x = F.interpolate(x, scale_factor=0.5, mode='bilinear', align_corners=False)
+        b, c, h, w = x.shape
+        x = x.view(1, b * c, h, w)
+        # weight: (b*c_out, c_in, k, k), groups=b
+        out = F.conv2d(x, weight, padding=self.padding, groups=b)
+        out = out.view(b, self.out_channels, *out.shape[2:4])
+        return out
+    def __repr__(self):
+        return (f'{self.__class__.__name__}(in_channels={self.in_channels}, out_channels={self.out_channels}, '
+                f'kernel_size={self.kernel_size}, demodulate={self.demodulate}, sample_mode={self.sample_mode})')
+class StyleConv(torch.nn.Module):
+    """Style conv used in StyleGAN2.
+    Args:
+        in_channels (int): Channel number of the input.
+        out_channels (int): Channel number of the output.
+        kernel_size (int): Size of the convolving kernel.
+        num_style_feat (int): Channel number of style features.
+        demodulate (bool): Whether demodulate in the conv layer. Default: True.
+        sample_mode (str | None): Indicating 'upsample', 'downsample' or None. Default: None.
+    """
+    def __init__(self, in_channels, out_channels, kernel_size, num_style_feat, demodulate=True, sample_mode=None):
+        super(StyleConv, self).__init__()
+        self.modulated_conv = ModulatedConv2d(
+            in_channels, out_channels, kernel_size, num_style_feat, demodulate=demodulate, sample_mode=sample_mode)
+        self.weight = torch.nn.Parameter(torch.zeros(1))  # for noise injection
+        self.bias = torch.nn.Parameter(torch.zeros(1, out_channels, 1, 1))
+        self.activate = torch.nn.LeakyReLU(negative_slope=0.2, inplace=True)
+    def forward(self, x, style, noise=None):
+        # modulate
+        out = self.modulated_conv(x, style) * 2**0.5  # for conversion
+        # noise injection
+        if noise is None:
+            b, _, h, w = out.shape
+            noise = out.new_empty(b, 1, h, w).normal_()
+        out = out + self.weight * noise
+        # add bias
+        out = out + self.bias
+        # activation
+        out = self.activate(out)
+        return out
+class ToRGB(torch.nn.Module):
+    """To RGB (image space) from features.
+    Args:
+        in_channels (int): Channel number of input.
+        num_style_feat (int): Channel number of style features.
+        upsample (bool): Whether to upsample. Default: True.
+    """
+    def __init__(self, in_channels, num_style_feat, upsample=True):
+        super(ToRGB, self).__init__()
+        self.upsample = upsample
+        self.modulated_conv = ModulatedConv2d(
+            in_channels, 3, kernel_size=1, num_style_feat=num_style_feat, demodulate=False, sample_mode=None)
+        self.bias = torch.nn.Parameter(torch.zeros(1, 3, 1, 1))
+    def forward(self, x, style, skip=None):
+        """Forward function.
+        Args:
+            x (Tensor): Feature tensor with shape (b, c, h, w).
+            style (Tensor): Tensor with shape (b, num_style_feat).
+            skip (Tensor): Base/skip tensor. Default: None.
+        Returns:
+            Tensor: RGB images.
+        """
+        out = self.modulated_conv(x, style)
+        out = out + self.bias
+        if skip is not None:
+            if self.upsample:
+                skip = F.interpolate(skip, scale_factor=2, mode='bilinear', align_corners=False)
+            out = out + skip
+        return out
+class ConstantInput(torch.nn.Module):
+    """Constant input.
+    Args:
+        num_channel (int): Channel number of constant input.
+        size (int): Spatial size of constant input.
+    """
+    def __init__(self, num_channel, size):
+        super(ConstantInput, self).__init__()
+        self.weight = torch.nn.Parameter(torch.randn(1, num_channel, size, size))
+    def forward(self, batch):
+        out = self.weight.repeat(batch, 1, 1, 1)
+        return out
+class StyleGAN2GeneratorClean(torch.nn.Module):
+    """Clean version of StyleGAN2 Generator.
+    Args:
+        out_size (int): The spatial size of outputs.
+        num_style_feat (int): Channel number of style features. Default: 512.
+        num_mlp (int): Layer number of MLP style layers. Default: 8.
+        channel_multiplier (int): Channel multiplier for large networks of StyleGAN2. Default: 2.
+        narrow (float): Narrow ratio for channels. Default: 1.0.
+    """
+    def __init__(self, out_size, num_style_feat=512, num_mlp=8, channel_multiplier=2, narrow=1):
+        super(StyleGAN2GeneratorClean, self).__init__()
+        # Style MLP layers
+        self.num_style_feat = num_style_feat
+        style_mlp_layers = [NormStyleCode()]
+        for i in range(num_mlp):
+            style_mlp_layers.extend(
+                [torch.nn.Linear(num_style_feat, num_style_feat, bias=True),
+                 torch.nn.LeakyReLU(negative_slope=0.2, inplace=True)])
+        self.style_mlp = torch.nn.Sequential(*style_mlp_layers)
+        # initialization
+        # default_init_weights(self.style_mlp, scale=1, bias_fill=0, a=0.2, mode='fan_in', nonlinearity='leaky_relu')
+        # channel list
+        channels = {
+            '4': int(512 * narrow),
+            '8': int(512 * narrow),
+            '16': int(512 * narrow),
+            '32': int(512 * narrow),
+            '64': int(256 * channel_multiplier * narrow),
+            '128': int(128 * channel_multiplier * narrow),
+            '256': int(64 * channel_multiplier * narrow),
+            '512': int(32 * channel_multiplier * narrow),
+            '1024': int(16 * channel_multiplier * narrow)
+        }
+        self.channels = channels
+        self.constant_input = ConstantInput(channels['4'], size=4)
+        self.style_conv1 = StyleConv(
+            channels['4'],
+            channels['4'],
+            kernel_size=3,
+            num_style_feat=num_style_feat,
+            demodulate=True,
+            sample_mode=None)
+        self.to_rgb1 = ToRGB(channels['4'], num_style_feat, upsample=False)
+        self.log_size = int(math.log(out_size, 2))
+        self.num_layers = (self.log_size - 2) * 2 + 1
+        self.num_latent = self.log_size * 2 - 2
+        self.style_convs = torch.nn.ModuleList()
+        self.to_rgbs = torch.nn.ModuleList()
+        self.noises = torch.nn.Module()
+        in_channels = channels['4']
+        # noise
+        for layer_idx in range(self.num_layers):
+            resolution = 2**((layer_idx + 5) // 2)
+            shape = [1, 1, resolution, resolution]
+            self.noises.register_buffer(f'noise{layer_idx}', torch.randn(*shape))
+        # style convs and to_rgbs
+        for i in range(3, self.log_size + 1):
+            out_channels = channels[f'{2**i}']
+            self.style_convs.append(
+                StyleConv(
+                    in_channels,
+                    out_channels,
+                    kernel_size=3,
+                    num_style_feat=num_style_feat,
+                    demodulate=True,
+                    sample_mode='upsample'))
+            self.style_convs.append(
+                StyleConv(
+                    out_channels,
+                    out_channels,
+                    kernel_size=3,
+                    num_style_feat=num_style_feat,
+                    demodulate=True,
+                    sample_mode=None))
+            self.to_rgbs.append(ToRGB(out_channels, num_style_feat, upsample=True))
+            in_channels = out_channels
+    def make_noise(self):
+        """Make noise for noise injection."""
+        device = self.constant_input.weight.device
+        noises = [torch.randn(1, 1, 4, 4, device=device)]
+        for i in range(3, self.log_size + 1):
+            for _ in range(2):
+                noises.append(torch.randn(1, 1, 2**i, 2**i, device=device))
+        return noises
+    def get_latent(self, x):
+        return self.style_mlp(x)
+    def mean_latent(self, num_latent):
+        latent_in = torch.randn(num_latent, self.num_style_feat, device=self.constant_input.weight.device)
+        latent = self.style_mlp(latent_in).mean(0, keepdim=True)
+        return latent
+    def forward(self,
+                styles,
+                input_is_latent=False,
+                noise=None,
+                randomize_noise=True,
+                truncation=1,
+                truncation_latent=None,
+                inject_index=None,
+                return_latents=False):
+        """Forward function for StyleGAN2GeneratorClean.
+        Args:
+            styles (list[Tensor]): Sample codes of styles.
+            input_is_latent (bool): Whether input is latent style. Default: False.
+            noise (Tensor | None): Input noise or None. Default: None.
+            randomize_noise (bool): Randomize noise, used when 'noise' is False. Default: True.
+            truncation (float): The truncation ratio. Default: 1.
+            truncation_latent (Tensor | None): The truncation latent tensor. Default: None.
+            inject_index (int | None): The injection index for mixing noise. Default: None.
+            return_latents (bool): Whether to return style latents. Default: False.
+        """
+        # style codes -> latents with Style MLP layer
+        if not input_is_latent:
+            styles = [self.style_mlp(s) for s in styles]
+        # noises
+        if noise is None:
+            if randomize_noise:
+                noise = [None] * self.num_layers  # for each style conv layer
+            else:  # use the stored noise
+                noise = [getattr(self.noises, f'noise{i}') for i in range(self.num_layers)]
+        # style truncation
+        if truncation < 1:
+            style_truncation = []
+            for style in styles:
+                style_truncation.append(truncation_latent + truncation * (style - truncation_latent))
+            styles = style_truncation
+        # get style latents with injection
+        if len(styles) == 1:
+            inject_index = self.num_latent
+            if styles[0].ndim < 3:
+                # repeat latent code for all the layers
+                latent = styles[0].unsqueeze(1).repeat(1, inject_index, 1)
+            else:  # used for encoder with different latent code for each layer
+                latent = styles[0]
+        elif len(styles) == 2:  # mixing noises
+            if inject_index is None:
+                inject_index = random.randint(1, self.num_latent - 1)
+            latent1 = styles[0].unsqueeze(1).repeat(1, inject_index, 1)
+            latent2 = styles[1].unsqueeze(1).repeat(1, self.num_latent - inject_index, 1)
+            latent = torch.cat([latent1, latent2], 1)
+        # main generation
+        out = self.constant_input(latent.shape[0])
+        out = self.style_conv1(out, latent[:, 0], noise=noise[0])
+        skip = self.to_rgb1(out, latent[:, 1])
+        i = 1
+        for conv1, conv2, noise1, noise2, to_rgb in zip(self.style_convs[::2], self.style_convs[1::2], noise[1::2],
+                                                        noise[2::2], self.to_rgbs):
+            out = conv1(out, latent[:, i], noise=noise1)
+            out = conv2(out, latent[:, i + 1], noise=noise2)
+            skip = to_rgb(out, latent[:, i + 2], skip)  # feature back to the rgb space
+            i += 2
+        image = skip
+        if return_latents:
+            return image, latent
+        else:
+            return image, None

src/PostProcess/ParsingModel/model.py ADDED Viewed

	@@ -0,0 +1,323 @@

+#!/usr/bin/python
+# -*- encoding: utf-8 -*-
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from src.PostProcess.ParsingModel.resnet import Resnet18
+from src.PostProcess.utils import encode_segmentation_rgb_batch
+from typing import Tuple
+class ConvBNReLU(nn.Module):
+    def __init__(self, in_chan, out_chan, ks=3, stride=1, padding=1, *args, **kwargs):
+        super(ConvBNReLU, self).__init__()
+        self.conv = nn.Conv2d(
+            in_chan,
+            out_chan,
+            kernel_size=ks,
+            stride=stride,
+            padding=padding,
+            bias=False,
+        )
+        self.bn = nn.BatchNorm2d(out_chan)
+        self.init_weight()
+    def forward(self, x):
+        x = self.conv(x)
+        x = F.relu(self.bn(x))
+        return x
+    def init_weight(self):
+        for ly in self.children():
+            if isinstance(ly, nn.Conv2d):
+                nn.init.kaiming_normal_(ly.weight, a=1)
+                if ly.bias is not None:
+                    nn.init.constant_(ly.bias, 0)
+class BiSeNetOutput(nn.Module):
+    def __init__(self, in_chan, mid_chan, n_classes, *args, **kwargs):
+        super(BiSeNetOutput, self).__init__()
+        self.conv = ConvBNReLU(in_chan, mid_chan, ks=3, stride=1, padding=1)
+        self.conv_out = nn.Conv2d(mid_chan, n_classes, kernel_size=1, bias=False)
+        self.init_weight()
+    def forward(self, x):
+        x = self.conv(x)
+        x = self.conv_out(x)
+        return x
+    def init_weight(self):
+        for ly in self.children():
+            if isinstance(ly, nn.Conv2d):
+                nn.init.kaiming_normal_(ly.weight, a=1)
+                if ly.bias is not None:
+                    nn.init.constant_(ly.bias, 0)
+    def get_params(self):
+        wd_params, nowd_params = [], []
+        for name, module in self.named_modules():
+            if isinstance(module, nn.Linear) or isinstance(module, nn.Conv2d):
+                wd_params.append(module.weight)
+                if module.bias is not None:
+                    nowd_params.append(module.bias)
+            elif isinstance(module, nn.BatchNorm2d):
+                nowd_params += list(module.parameters())
+        return wd_params, nowd_params
+class AttentionRefinementModule(nn.Module):
+    def __init__(self, in_chan, out_chan, *args, **kwargs):
+        super(AttentionRefinementModule, self).__init__()
+        self.conv = ConvBNReLU(in_chan, out_chan, ks=3, stride=1, padding=1)
+        self.conv_atten = nn.Conv2d(out_chan, out_chan, kernel_size=1, bias=False)
+        self.bn_atten = nn.BatchNorm2d(out_chan)
+        self.sigmoid_atten = nn.Sigmoid()
+        self.init_weight()
+    def forward(self, x):
+        feat = self.conv(x)
+        atten = F.avg_pool2d(feat, feat.size()[2:])
+        atten = self.conv_atten(atten)
+        atten = self.bn_atten(atten)
+        atten = self.sigmoid_atten(atten)
+        out = torch.mul(feat, atten)
+        return out
+    def init_weight(self):
+        for ly in self.children():
+            if isinstance(ly, nn.Conv2d):
+                nn.init.kaiming_normal_(ly.weight, a=1)
+                if ly.bias is not None:
+                    nn.init.constant_(ly.bias, 0)
+class ContextPath(nn.Module):
+    def __init__(self, *args, **kwargs):
+        super(ContextPath, self).__init__()
+        self.resnet = Resnet18()
+        self.arm16 = AttentionRefinementModule(256, 128)
+        self.arm32 = AttentionRefinementModule(512, 128)
+        self.conv_head32 = ConvBNReLU(128, 128, ks=3, stride=1, padding=1)
+        self.conv_head16 = ConvBNReLU(128, 128, ks=3, stride=1, padding=1)
+        self.conv_avg = ConvBNReLU(512, 128, ks=1, stride=1, padding=0)
+        self.init_weight()
+    def forward(self, x):
+        H0, W0 = x.size()[2:]
+        feat8, feat16, feat32 = self.resnet(x)
+        H8, W8 = feat8.size()[2:]
+        H16, W16 = feat16.size()[2:]
+        H32, W32 = feat32.size()[2:]
+        avg = F.avg_pool2d(feat32, feat32.size()[2:])
+        avg = self.conv_avg(avg)
+        avg_up = F.interpolate(avg, (H32, W32), mode="nearest")
+        feat32_arm = self.arm32(feat32)
+        feat32_sum = feat32_arm + avg_up
+        feat32_up = F.interpolate(feat32_sum, (H16, W16), mode="nearest")
+        feat32_up = self.conv_head32(feat32_up)
+        feat16_arm = self.arm16(feat16)
+        feat16_sum = feat16_arm + feat32_up
+        feat16_up = F.interpolate(feat16_sum, (H8, W8), mode="nearest")
+        feat16_up = self.conv_head16(feat16_up)
+        return feat8, feat16_up, feat32_up  # x8, x8, x16
+    def init_weight(self):
+        for ly in self.children():
+            if isinstance(ly, nn.Conv2d):
+                nn.init.kaiming_normal_(ly.weight, a=1)
+                if ly.bias is not None:
+                    nn.init.constant_(ly.bias, 0)
+    def get_params(self):
+        wd_params, nowd_params = [], []
+        for name, module in self.named_modules():
+            if isinstance(module, (nn.Linear, nn.Conv2d)):
+                wd_params.append(module.weight)
+                if module.bias is not None:
+                    nowd_params.append(module.bias)
+            elif isinstance(module, nn.BatchNorm2d):
+                nowd_params += list(module.parameters())
+        return wd_params, nowd_params
+# This is not used, since I replace this with the resnet feature with the same size
+class SpatialPath(nn.Module):
+    def __init__(self, *args, **kwargs):
+        super(SpatialPath, self).__init__()
+        self.conv1 = ConvBNReLU(3, 64, ks=7, stride=2, padding=3)
+        self.conv2 = ConvBNReLU(64, 64, ks=3, stride=2, padding=1)
+        self.conv3 = ConvBNReLU(64, 64, ks=3, stride=2, padding=1)
+        self.conv_out = ConvBNReLU(64, 128, ks=1, stride=1, padding=0)
+        self.init_weight()
+    def forward(self, x):
+        feat = self.conv1(x)
+        feat = self.conv2(feat)
+        feat = self.conv3(feat)
+        feat = self.conv_out(feat)
+        return feat
+    def init_weight(self):
+        for ly in self.children():
+            if isinstance(ly, nn.Conv2d):
+                nn.init.kaiming_normal_(ly.weight, a=1)
+                if ly.bias is not None:
+                    nn.init.constant_(ly.bias, 0)
+    def get_params(self):
+        wd_params, nowd_params = [], []
+        for name, module in self.named_modules():
+            if isinstance(module, nn.Linear) or isinstance(module, nn.Conv2d):
+                wd_params.append(module.weight)
+                if module.bias is not None:
+                    nowd_params.append(module.bias)
+            elif isinstance(module, nn.BatchNorm2d):
+                nowd_params += list(module.parameters())
+        return wd_params, nowd_params
+class FeatureFusionModule(nn.Module):
+    def __init__(self, in_chan, out_chan, *args, **kwargs):
+        super(FeatureFusionModule, self).__init__()
+        self.convblk = ConvBNReLU(in_chan, out_chan, ks=1, stride=1, padding=0)
+        self.conv1 = nn.Conv2d(
+            out_chan, out_chan // 4, kernel_size=1, stride=1, padding=0, bias=False
+        )
+        self.conv2 = nn.Conv2d(
+            out_chan // 4, out_chan, kernel_size=1, stride=1, padding=0, bias=False
+        )
+        self.relu = nn.ReLU(inplace=True)
+        self.sigmoid = nn.Sigmoid()
+        self.init_weight()
+    def forward(self, fsp, fcp):
+        fcat = torch.cat([fsp, fcp], dim=1)
+        feat = self.convblk(fcat)
+        atten = F.avg_pool2d(feat, feat.size()[2:])
+        atten = self.conv1(atten)
+        atten = self.relu(atten)
+        atten = self.conv2(atten)
+        atten = self.sigmoid(atten)
+        feat_atten = torch.mul(feat, atten)
+        feat_out = feat_atten + feat
+        return feat_out
+    def init_weight(self):
+        for ly in self.children():
+            if isinstance(ly, nn.Conv2d):
+                nn.init.kaiming_normal_(ly.weight, a=1)
+                if ly.bias is not None:
+                    nn.init.constant_(ly.bias, 0)
+    def get_params(self):
+        wd_params, nowd_params = [], []
+        for name, module in self.named_modules():
+            if isinstance(module, nn.Linear) or isinstance(module, nn.Conv2d):
+                wd_params.append(module.weight)
+                if module.bias is not None:
+                    nowd_params.append(module.bias)
+            elif isinstance(module, nn.BatchNorm2d):
+                nowd_params += list(module.parameters())
+        return wd_params, nowd_params
+class BiSeNet(nn.Module):
+    def __init__(self, n_classes, *args, **kwargs):
+        super(BiSeNet, self).__init__()
+        self.cp = ContextPath()
+        # here self.sp is deleted
+        self.ffm = FeatureFusionModule(256, 256)
+        self.conv_out = BiSeNetOutput(256, 256, n_classes)
+        self.conv_out16 = BiSeNetOutput(128, 64, n_classes)
+        self.conv_out32 = BiSeNetOutput(128, 64, n_classes)
+        self.init_weight()
+    def get_mask(
+        self, x: torch.Tensor, crop_size: int
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        x = F.interpolate(x, size=(512, 512))
+        parsed_face = self.forward(x)[0]
+        parsed_face = torch.argmax(parsed_face, dim=1, keepdim=True)
+        parsed_face = encode_segmentation_rgb_batch(parsed_face)
+        parsed_face = torch.where(
+            torch.sum(parsed_face, dim=[1, 2, 3], keepdim=True) > 5000,
+            parsed_face,
+            torch.zeros_like(parsed_face),
+        )
+        ignore_mask_ids = torch.sum(parsed_face, dim=[1, 2, 3]) == 0
+        parsed_face = parsed_face.float().mul_(1 / 255.0)
+        parsed_face = F.interpolate(
+            parsed_face, size=(crop_size, crop_size), mode="bilinear"
+        )
+        parsed_face = torch.sum(parsed_face, dim=1, keepdim=True)
+        return parsed_face, ignore_mask_ids
+    def forward(self, x):
+        H, W = x.size()[2:]
+        feat_res8, feat_cp8, feat_cp16 = self.cp(x)  # here return res3b1 feature
+        feat_sp = feat_res8  # use res3b1 feature to replace spatial path feature
+        feat_fuse = self.ffm(feat_sp, feat_cp8)
+        feat_out = self.conv_out(feat_fuse)
+        feat_out16 = self.conv_out16(feat_cp8)
+        feat_out32 = self.conv_out32(feat_cp16)
+        feat_out = F.interpolate(feat_out, (H, W), mode="bilinear", align_corners=True)
+        feat_out16 = F.interpolate(
+            feat_out16, (H, W), mode="bilinear", align_corners=True
+        )
+        feat_out32 = F.interpolate(
+            feat_out32, (H, W), mode="bilinear", align_corners=True
+        )
+        return feat_out, feat_out16, feat_out32
+    def init_weight(self):
+        for ly in self.children():
+            if isinstance(ly, nn.Conv2d):
+                nn.init.kaiming_normal_(ly.weight, a=1)
+                if ly.bias is not None:
+                    nn.init.constant_(ly.bias, 0)
+    def get_params(self):
+        wd_params, nowd_params, lr_mul_wd_params, lr_mul_nowd_params = [], [], [], []
+        for name, child in self.named_children():
+            child_wd_params, child_nowd_params = child.get_params()
+            if isinstance(child, FeatureFusionModule) or isinstance(
+                child, BiSeNetOutput
+            ):
+                lr_mul_wd_params += child_wd_params
+                lr_mul_nowd_params += child_nowd_params
+            else:
+                wd_params += child_wd_params
+                nowd_params += child_nowd_params
+        return wd_params, nowd_params, lr_mul_wd_params, lr_mul_nowd_params
+if __name__ == "__main__":
+    net = BiSeNet(19)
+    net.cuda()
+    net.eval()
+    in_ten = torch.randn(16, 3, 640, 480).cuda()
+    out, out16, out32 = net(in_ten)
+    print(out.shape)
+    net.get_params()

src/PostProcess/ParsingModel/resnet.py ADDED Viewed

	@@ -0,0 +1,109 @@

+#!/usr/bin/python
+# -*- encoding: utf-8 -*-
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.utils.model_zoo as modelzoo
+# from modules.bn import InPlaceABNSync as BatchNorm2d
+resnet18_url = "https://download.pytorch.org/models/resnet18-5c106cde.pth"
+def conv3x3(in_planes, out_planes, stride=1):
+    """3x3 convolution with padding"""
+    return nn.Conv2d(
+        in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=False
+    )
+class BasicBlock(nn.Module):
+    def __init__(self, in_chan, out_chan, stride=1):
+        super(BasicBlock, self).__init__()
+        self.conv1 = conv3x3(in_chan, out_chan, stride)
+        self.bn1 = nn.BatchNorm2d(out_chan)
+        self.conv2 = conv3x3(out_chan, out_chan)
+        self.bn2 = nn.BatchNorm2d(out_chan)
+        self.relu = nn.ReLU(inplace=True)
+        self.downsample = None
+        if in_chan != out_chan or stride != 1:
+            self.downsample = nn.Sequential(
+                nn.Conv2d(in_chan, out_chan, kernel_size=1, stride=stride, bias=False),
+                nn.BatchNorm2d(out_chan),
+            )
+    def forward(self, x):
+        residual = self.conv1(x)
+        residual = F.relu(self.bn1(residual))
+        residual = self.conv2(residual)
+        residual = self.bn2(residual)
+        shortcut = x
+        if self.downsample is not None:
+            shortcut = self.downsample(x)
+        out = shortcut + residual
+        out = self.relu(out)
+        return out
+def create_layer_basic(in_chan, out_chan, bnum, stride=1):
+    layers = [BasicBlock(in_chan, out_chan, stride=stride)]
+    for i in range(bnum - 1):
+        layers.append(BasicBlock(out_chan, out_chan, stride=1))
+    return nn.Sequential(*layers)
+class Resnet18(nn.Module):
+    def __init__(self):
+        super(Resnet18, self).__init__()
+        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
+        self.bn1 = nn.BatchNorm2d(64)
+        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
+        self.layer1 = create_layer_basic(64, 64, bnum=2, stride=1)
+        self.layer2 = create_layer_basic(64, 128, bnum=2, stride=2)
+        self.layer3 = create_layer_basic(128, 256, bnum=2, stride=2)
+        self.layer4 = create_layer_basic(256, 512, bnum=2, stride=2)
+        self.init_weight()
+    def forward(self, x):
+        x = self.conv1(x)
+        x = F.relu(self.bn1(x))
+        x = self.maxpool(x)
+        x = self.layer1(x)
+        feat8 = self.layer2(x)  # 1/8
+        feat16 = self.layer3(feat8)  # 1/16
+        feat32 = self.layer4(feat16)  # 1/32
+        return feat8, feat16, feat32
+    def init_weight(self):
+        state_dict = modelzoo.load_url(resnet18_url)
+        self_state_dict = self.state_dict()
+        for k, v in state_dict.items():
+            if "fc" in k:
+                continue
+            self_state_dict.update({k: v})
+        self.load_state_dict(self_state_dict)
+    def get_params(self):
+        wd_params, nowd_params = [], []
+        for name, module in self.named_modules():
+            if isinstance(module, (nn.Linear, nn.Conv2d)):
+                wd_params.append(module.weight)
+                if module.bias is not None:
+                    nowd_params.append(module.bias)
+            elif isinstance(module, nn.BatchNorm2d):
+                nowd_params += list(module.parameters())
+        return wd_params, nowd_params
+if __name__ == "__main__":
+    net = Resnet18()
+    x = torch.randn(16, 3, 224, 224)
+    out = net(x)
+    print(out[0].size())
+    print(out[1].size())
+    print(out[2].size())
+    net.get_params()

src/PostProcess/utils.py ADDED Viewed

	@@ -0,0 +1,122 @@

+import numpy as np
+import torch
+import torch.nn.functional as F
+from typing import Tuple
+class SoftErosion(torch.nn.Module):
+    def __init__(
+        self, kernel_size: int = 15, threshold: float = 0.6, iterations: int = 1
+    ):
+        super(SoftErosion, self).__init__()
+        r = kernel_size // 2
+        self.padding = r
+        self.iterations = iterations
+        self.threshold = threshold
+        # Create kernel
+        y_indices, x_indices = torch.meshgrid(
+            torch.arange(0.0, kernel_size), torch.arange(0.0, kernel_size)
+        )
+        dist = torch.sqrt((x_indices - r) ** 2 + (y_indices - r) ** 2)
+        kernel = dist.max() - dist
+        kernel /= kernel.sum()
+        kernel = kernel.view(1, 1, *kernel.shape)
+        self.register_buffer("weight", kernel)
+    def forward(self, x: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
+        for i in range(self.iterations - 1):
+            x = torch.min(
+                x,
+                F.conv2d(
+                    x, weight=self.weight, groups=x.shape[1], padding=self.padding
+                ),
+            )
+        x = F.conv2d(x, weight=self.weight, groups=x.shape[1], padding=self.padding)
+        mask = x >= self.threshold
+        x[mask] = 1.0
+        # add small epsilon to avoid Nans
+        x[~mask] /= (x[~mask].max() + 1e-7)
+        return x, mask
+def encode_segmentation_rgb(
+    segmentation: np.ndarray, no_neck: bool = True
+) -> np.ndarray:
+    parse = segmentation
+    # https://github.com/zllrunning/face-parsing.PyTorch/blob/master/prepropess_data.py
+    face_part_ids = (
+        [1, 2, 3, 4, 5, 6, 10, 12, 13]
+        if no_neck
+        else [1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 13, 14]
+    )
+    mouth_id = 11
+    # hair_id = 17
+    face_map = np.zeros([parse.shape[0], parse.shape[1]])
+    mouth_map = np.zeros([parse.shape[0], parse.shape[1]])
+    # hair_map = np.zeros([parse.shape[0], parse.shape[1]])
+    for valid_id in face_part_ids:
+        valid_index = np.where(parse == valid_id)
+        face_map[valid_index] = 255
+    valid_index = np.where(parse == mouth_id)
+    mouth_map[valid_index] = 255
+    # valid_index = np.where(parse==hair_id)
+    # hair_map[valid_index] = 255
+    # return np.stack([face_map, mouth_map,hair_map], axis=2)
+    return np.stack([face_map, mouth_map], axis=2)
+def encode_segmentation_rgb_batch(
+    segmentation: torch.Tensor, no_neck: bool = True
+) -> torch.Tensor:
+    # https://github.com/zllrunning/face-parsing.PyTorch/blob/master/prepropess_data.py
+    face_part_ids = (
+        [1, 2, 3, 4, 5, 6, 10, 12, 13]
+        if no_neck
+        else [1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 13, 14]
+    )
+    mouth_id = 11
+    # hair_id = 17
+    segmentation = segmentation.int()
+    face_map = torch.zeros_like(segmentation)
+    mouth_map = torch.zeros_like(segmentation)
+    # hair_map = np.zeros([parse.shape[0], parse.shape[1]])
+    white_tensor = face_map + 255
+    for valid_id in face_part_ids:
+        face_map = torch.where(segmentation == valid_id, white_tensor, face_map)
+    mouth_map = torch.where(segmentation == mouth_id, white_tensor, mouth_map)
+    return torch.cat([face_map, mouth_map], dim=1)
+def postprocess(
+    swapped_face: np.ndarray,
+    target: np.ndarray,
+    target_mask: np.ndarray,
+    smooth_mask: torch.nn.Module,
+) -> np.ndarray:
+    # target_mask = cv2.resize(target_mask, (self.size,  self.size))
+    mask_tensor = (
+        torch.from_numpy(target_mask.copy().transpose((2, 0, 1)))
+        .float()
+        .mul_(1 / 255.0)
+        .cuda()
+    )
+    face_mask_tensor = mask_tensor[0] + mask_tensor[1]
+    soft_face_mask_tensor, _ = smooth_mask(face_mask_tensor.unsqueeze_(0).unsqueeze_(0))
+    soft_face_mask_tensor.squeeze_()
+    soft_face_mask = soft_face_mask_tensor.cpu().numpy()
+    soft_face_mask = soft_face_mask[:, :, np.newaxis]
+    result = swapped_face * soft_face_mask + target * (1 - soft_face_mask)
+    result = result[:, :, ::-1]  # .astype(np.uint8)
+    return result

src/model_loader.py ADDED Viewed

	@@ -0,0 +1,106 @@

+from collections import namedtuple
+import torch
+from torch.utils import model_zoo
+import requests
+from tqdm import tqdm
+from pathlib import Path
+from src.FaceDetector.face_detector import FaceDetector
+from src.FaceId.faceid import FaceId
+from src.Generator.fs_networks_fix import Generator_Adain_Upsample
+from src.PostProcess.ParsingModel.model import BiSeNet
+from src.PostProcess.GFPGAN.gfpgan import GFPGANer
+from src.Blend.blend import BlendModule
+model = namedtuple("model", ["url", "model"])
+models = {
+    "face_detector": model(
+        url="https://github.com/mike9251/simswap-inference-pytorch/releases/download/weights/face_detector_scrfd_10g_bnkps.onnx",
+        model=FaceDetector,
+    ),
+    "arcface": model(
+        url="https://github.com/mike9251/simswap-inference-pytorch/releases/download/weights/arcface_net.jit",
+        model=FaceId,
+    ),
+    "generator_224": model(
+        url="https://github.com/mike9251/simswap-inference-pytorch/releases/download/weights/simswap_224_latest_net_G.pth",
+        model=Generator_Adain_Upsample,
+    ),
+    "generator_512": model(
+        url="https://github.com/mike9251/simswap-inference-pytorch/releases/download/weights/simswap_512_390000_net_G.pth",
+        model=Generator_Adain_Upsample,
+    ),
+    "parsing_model": model(
+        url="https://github.com/mike9251/simswap-inference-pytorch/releases/download/weights/parsing_model_79999_iter.pth",
+        model=BiSeNet,
+    ),
+    "gfpgan": model(
+        url="https://github.com/mike9251/simswap-inference-pytorch/releases/download/v1.1/GFPGANv1.4_ema.pth",
+        model=GFPGANer,
+    ),
+    "blend_module": model(
+        url="https://github.com/mike9251/simswap-inference-pytorch/releases/download/v1.2/blend_module.jit",
+        model=BlendModule
+    )
+}
+def get_model(
+        model_name: str,
+        device: torch.device,
+        load_state_dice: bool,
+        model_path: Path,
+        **kwargs,
+):
+    dst_dir = Path.cwd() / "weights"
+    dst_dir.mkdir(exist_ok=True)
+    url = models[model_name].url if not model_path.is_file() else str(model_path)
+    if load_state_dice:
+        model = models[model_name].model(**kwargs)
+        if Path(url).is_file():
+            state_dict = torch.load(url)
+        else:
+            state_dict = model_zoo.load_url(
+                url,
+                model_dir=str(dst_dir),
+                progress=True,
+                map_location="cpu",
+            )
+        model.load_state_dict(state_dict)
+        model.to(device)
+        model.eval()
+    else:
+        dst_path = Path(url)
+        if not dst_path.is_file():
+            dst_path = dst_dir / Path(url).name
+        if not dst_path.is_file():
+            print(f"Downloading: '{url}' to {dst_path}")
+            response = requests.get(url, stream=True)
+            if int(response.status_code) == 200:
+                file_size = int(response.headers["Content-Length"]) / (2 ** 20)
+                chunk_size = 1024
+                bar_format = "{desc}: {percentage:3.0f}%|{bar}| {n:3.1f}M/{total:3.1f}M [{elapsed}<{remaining}]"
+                with open(dst_path, "wb") as handle:
+                    with tqdm(total=file_size, bar_format=bar_format) as pbar:
+                        for data in response.iter_content(chunk_size=chunk_size):
+                            handle.write(data)
+                            pbar.update(len(data) / (2 ** 20))
+            else:
+                raise ValueError(
+                    f"Couldn't download weights {url}. Specify weights for the '{model_name}' model manually."
+                )
+        kwargs.update({"model_path": str(dst_path), "device": device})
+        model = models[model_name].model(**kwargs)
+    return model

src/simswap.py ADDED Viewed

	@@ -0,0 +1,322 @@

+import numpy as np
+import torch
+import torch.nn.functional as F
+from typing import Iterable, Tuple, Union
+from pathlib import Path
+from torchvision import transforms
+import kornia
+from omegaconf import DictConfig
+from src.FaceDetector.face_detector import Detection
+from src.FaceAlign.face_align import align_face, inverse_transform_batch
+from src.PostProcess.utils import SoftErosion
+from src.model_loader import get_model
+from src.Misc.types import CheckpointType, FaceAlignmentType
+from src.Misc.utils import tensor2img
+class SimSwap:
+    def __init__(
+        self,
+        config: DictConfig,
+        id_image: Union[np.ndarray, None] = None,
+        specific_image: Union[np.ndarray, None] = None,
+    ):
+        self.id_image: Union[np.ndarray, None] = id_image
+        self.id_latent: Union[torch.Tensor,  None] = None
+        self.specific_id_image: Union[np.ndarray,  None] = specific_image
+        self.specific_latent: Union[torch.Tensor,  None] = None
+        self.use_mask: Union[bool, None] = True
+        self.crop_size: Union[int, None] = None
+        self.checkpoint_type: Union[CheckpointType,  None] = None
+        self.face_alignment_type: Union[FaceAlignmentType,  None] = None
+        self.smooth_mask_iter: Union[int,  None] = None
+        self.smooth_mask_kernel_size: Union[int,  None] = None
+        self.smooth_mask_threshold: Union[float,  None] = None
+        self.face_detector_threshold: Union[float,  None] = None
+        self.specific_latent_match_threshold: Union[float,  None] = None
+        self.device = torch.device(config.device)
+        self.set_parameters(config)
+        # For BiSeNet and for official_224 SimSwap
+        self.to_tensor_normalize = transforms.Compose(
+            [
+                transforms.ToTensor(),
+                transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
+            ]
+        )
+        # For SimSwap models trained with the updated code
+        self.to_tensor = transforms.ToTensor()
+        self.face_detector = get_model(
+            "face_detector",
+            device=self.device,
+            load_state_dice=False,
+            model_path=Path(config.face_detector_weights),
+            det_thresh=self.face_detector_threshold,
+            det_size=(640, 640),
+            mode="ffhq",
+        )
+        self.face_id_net = get_model(
+            "arcface",
+            device=self.device,
+            load_state_dice=False,
+            model_path=Path(config.face_id_weights),
+        )
+        self.bise_net = get_model(
+            "parsing_model",
+            device=self.device,
+            load_state_dice=True,
+            model_path=Path(config.parsing_model_weights),
+            n_classes=19,
+        )
+        gen_model = "generator_512" if self.crop_size == 512 else "generator_224"
+        self.simswap_net = get_model(
+            gen_model,
+            device=self.device,
+            load_state_dice=True,
+            model_path=Path(config.simswap_weights),
+            input_nc=3,
+            output_nc=3,
+            latent_size=512,
+            n_blocks=9,
+            deep=True if self.crop_size == 512 else False,
+            use_last_act=True
+            if self.checkpoint_type == CheckpointType.OFFICIAL_224
+            else False,
+        )
+        self.blend = get_model(
+            "blend_module",
+            device=self.device,
+            load_state_dice=False,
+            model_path=Path(config.blend_module_weights)
+        )
+        self.enhance_output = config.enhance_output
+        if config.enhance_output:
+            self.gfpgan_net = get_model(
+                "gfpgan",
+                device=self.device,
+                load_state_dice=True,
+                model_path=Path(config.gfpgan_weights)
+            )
+    def set_parameters(self, config) -> None:
+        self.set_crop_size(config.crop_size)
+        self.set_checkpoint_type(config.checkpoint_type)
+        self.set_face_alignment_type(config.face_alignment_type)
+        self.set_face_detector_threshold(config.face_detector_threshold)
+        self.set_specific_latent_match_threshold(config.specific_latent_match_threshold)
+        self.set_smooth_mask_kernel_size(config.smooth_mask_kernel_size)
+        self.set_smooth_mask_threshold(config.smooth_mask_threshold)
+        self.set_smooth_mask_iter(config.smooth_mask_iter)
+    def set_crop_size(self, crop_size: int) -> None:
+        if crop_size < 0:
+            raise "Invalid crop_size! Must be a positive value."
+        self.crop_size = crop_size
+    def set_checkpoint_type(self, checkpoint_type: str) -> None:
+        type = CheckpointType(checkpoint_type)
+        if type not in (CheckpointType.OFFICIAL_224, CheckpointType.UNOFFICIAL):
+            raise "Invalid checkpoint_type! Must be one of the predefined values."
+        self.checkpoint_type = type
+    def set_face_alignment_type(self, face_alignment_type: str) -> None:
+        type = FaceAlignmentType(face_alignment_type)
+        if type not in (
+            FaceAlignmentType.FFHQ,
+            FaceAlignmentType.DEFAULT,
+        ):
+            raise "Invalid face_alignment_type! Must be one of the predefined values."
+        self.face_alignment_type = type
+    def set_face_detector_threshold(self, face_detector_threshold: float) -> None:
+        if face_detector_threshold < 0.0 or face_detector_threshold > 1.0:
+            raise "Invalid face_detector_threshold! Must be a positive value in range [0.0...1.0]."
+        self.face_detector_threshold = face_detector_threshold
+    def set_specific_latent_match_threshold(
+        self, specific_latent_match_threshold: float
+    ) -> None:
+        if specific_latent_match_threshold < 0.0:
+            raise "Invalid specific_latent_match_th! Must be a positive value."
+        self.specific_latent_match_threshold = specific_latent_match_threshold
+    def re_initialize_soft_mask(self):
+        self.smooth_mask = SoftErosion(kernel_size=self.smooth_mask_kernel_size,
+                                       threshold=self.smooth_mask_threshold,
+                                       iterations=self.smooth_mask_iter).to(self.device)
+    def set_smooth_mask_kernel_size(self, smooth_mask_kernel_size: int) -> None:
+        if smooth_mask_kernel_size < 0:
+            raise "Invalid smooth_mask_kernel_size! Must be a positive value."
+        smooth_mask_kernel_size += 1 if smooth_mask_kernel_size % 2 == 0 else 0
+        self.smooth_mask_kernel_size = smooth_mask_kernel_size
+        self.re_initialize_soft_mask()
+    def set_smooth_mask_threshold(self, smooth_mask_threshold: int) -> None:
+        if smooth_mask_threshold < 0 or smooth_mask_threshold > 1.0:
+            raise "Invalid smooth_mask_threshold! Must be within 0...1 range."
+        self.smooth_mask_threshold = smooth_mask_threshold
+        self.re_initialize_soft_mask()
+    def set_smooth_mask_iter(self, smooth_mask_iter: float) -> None:
+        if smooth_mask_iter < 0:
+            raise "Invalid smooth_mask_iter! Must be a positive value.."
+        self.smooth_mask_iter = smooth_mask_iter
+        self.re_initialize_soft_mask()
+    def run_detect_align(self, image: np.ndarray, for_id: bool = False) -> Tuple[Union[Iterable[np.ndarray], None],
+                                                                                 Union[Iterable[np.ndarray], None],
+                                                                                 np.ndarray]:
+        detection: Detection = self.face_detector(image)
+        if detection.bbox is None:
+            if for_id:
+                raise "Can't detect a face! Please change the ID image!"
+            return None, None, detection.score
+        kps = detection.key_points
+        if for_id:
+            max_score_ind = np.argmax(detection.score, axis=0)
+            kps = detection.key_points[max_score_ind]
+            kps = kps[None, ...]
+        align_imgs, transforms = align_face(
+            image,
+            kps,
+            crop_size=self.crop_size,
+            mode="ffhq"
+            if self.face_alignment_type == FaceAlignmentType.FFHQ
+            else "none",
+        )
+        return align_imgs, transforms, detection.score
+    def __call__(self, att_image: np.ndarray) -> np.ndarray:
+        if self.id_latent is None:
+            align_id_imgs, id_transforms, _ = self.run_detect_align(
+                self.id_image, for_id=True
+            )
+            # normalize=True, because official SimSwap model trained with normalized id_lattent
+            self.id_latent: torch.Tensor = self.face_id_net(
+                align_id_imgs, normalize=True
+            )
+        if self.specific_id_image is not None and self.specific_latent is None:
+            align_specific_imgs, specific_transforms, _ = self.run_detect_align(
+                self.specific_id_image, for_id=True
+            )
+            self.specific_latent: torch.Tensor = self.face_id_net(
+                align_specific_imgs, normalize=False
+            )
+        # for_id=False, because we want to get all faces
+        align_att_imgs, att_transforms, att_detection_score = self.run_detect_align(
+            att_image, for_id=False
+        )
+        if align_att_imgs is None and att_transforms is None:
+            return att_image
+        # Select specific crop from the target image
+        if self.specific_latent is not None:
+            att_latent: torch.Tensor = self.face_id_net(align_att_imgs, normalize=False)
+            latent_dist = torch.mean(
+                F.mse_loss(
+                    att_latent,
+                    self.specific_latent.repeat(att_latent.shape[0], 1),
+                    reduction="none",
+                ),
+                dim=-1,
+            )
+            att_detection_score = torch.tensor(
+                att_detection_score, device=latent_dist.device
+            )
+            min_index = torch.argmin(latent_dist * att_detection_score)
+            min_value = latent_dist[min_index]
+            if min_value < self.specific_latent_match_threshold:
+                align_att_imgs = [align_att_imgs[min_index]]
+                att_transforms = [att_transforms[min_index]]
+            else:
+                return att_image
+        swapped_img: torch.Tensor = self.simswap_net(align_att_imgs, self.id_latent)
+        if self.enhance_output:
+            swapped_img = self.gfpgan_net.enhance(swapped_img, weight=0.5)
+        # Put all crops/transformations into a batch
+        align_att_img_batch_for_parsing_model: torch.Tensor = torch.stack(
+            [self.to_tensor_normalize(x) for x in align_att_imgs], dim=0
+        )
+        align_att_img_batch_for_parsing_model = (
+            align_att_img_batch_for_parsing_model.to(self.device)
+        )
+        att_transforms: torch.Tensor = torch.stack(
+            [torch.tensor(x).float() for x in att_transforms], dim=0
+        )
+        att_transforms = att_transforms.to(self.device, non_blocking=True)
+        align_att_img_batch: torch.Tensor = torch.stack(
+            [self.to_tensor(x) for x in align_att_imgs], dim=0
+        )
+        align_att_img_batch = align_att_img_batch.to(self.device, non_blocking=True)
+        # Get face masks for the attribute image
+        face_mask, ignore_mask_ids = self.bise_net.get_mask(
+            align_att_img_batch_for_parsing_model, self.crop_size
+        )
+        inv_att_transforms: torch.Tensor = inverse_transform_batch(att_transforms)
+        soft_face_mask, _ = self.smooth_mask(face_mask)
+        swapped_img[ignore_mask_ids, ...] = align_att_img_batch[ignore_mask_ids, ...]
+        frame_size = (att_image.shape[0], att_image.shape[1])
+        att_image = self.to_tensor(att_image).to(self.device, non_blocking=True).unsqueeze(0)
+        target_image = kornia.geometry.transform.warp_affine(
+            swapped_img,
+            inv_att_transforms,
+            frame_size,
+            mode="bilinear",
+            padding_mode="border",
+            align_corners=True,
+            fill_value=torch.zeros(3),
+        )
+        soft_face_mask = kornia.geometry.transform.warp_affine(
+            soft_face_mask,
+            inv_att_transforms,
+            frame_size,
+            mode="bilinear",
+            padding_mode="zeros",
+            align_corners=True,
+            fill_value=torch.zeros(3),
+        )
+        result = self.blend(target_image, soft_face_mask, att_image)
+        return tensor2img(result)