Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,118 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# SD-VITON-Virtual-Try-On
|
| 2 |
+
This is the official repository for the following paper:
|
| 3 |
+
> **Towards Squeezing-Averse Virtual Try-On via Sequential Deformation** [[arxiv]](https://arxiv.org/pdf/2312.15861.pdf)
|
| 4 |
+
>
|
| 5 |
+
> Sang-Heon Shim, Jiwoo Chung, Jae-Pil Heo
|
| 6 |
+
> Accepted by **AAAI 2024**.
|
| 7 |
+
|
| 8 |
+

|
| 9 |
+
|
| 10 |
+
## Notice
|
| 11 |
+
This repository is currently built only for sharing the source code of an academic research paper.
|
| 12 |
+
It has several limitations. Please check out them at below.
|
| 13 |
+
|
| 14 |
+
## News
|
| 15 |
+
- *2024-01-31* We have released the source codes and checkpoints.
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
## Installation
|
| 19 |
+
|
| 20 |
+
Clone this repository:
|
| 21 |
+
|
| 22 |
+
```
|
| 23 |
+
git clone https://github.com/SHShim0513/SD-VITON.git
|
| 24 |
+
cd ./SD-VITON/
|
| 25 |
+
```
|
| 26 |
+
|
| 27 |
+
Install PyTorch and other dependencies:
|
| 28 |
+
|
| 29 |
+
```
|
| 30 |
+
conda create -n {env_name} python=3.8
|
| 31 |
+
conda activate {env_name}
|
| 32 |
+
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch-lts -c nvidia
|
| 33 |
+
pip install opencv-python torchgeometry Pillow tqdm tensorboardX scikit-image scipy timm==0.4.12
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
## Dataset
|
| 37 |
+
We train and evaluate our model using the dataset from the following [link](https://github.com/shadow2496/VITON-HD).
|
| 38 |
+
We assume that you have downloaded it into `./data`.
|
| 39 |
+
|
| 40 |
+
## Inference
|
| 41 |
+
|
| 42 |
+
Here are the download links for each model checkpoint:
|
| 43 |
+
|
| 44 |
+
|Dataset|Network Type|Output Resolution|Google Cloud|
|
| 45 |
+
|--------|--------|--------|-----------|
|
| 46 |
+
| VITON-HD | Try-on condition generator | Appearance flows with 128 x 96 | [Download](https://drive.google.com/drive/folders/1sqKNvyTsF8HGAv72wV2nLIeZmYA1Za9V?usp=drive_link) |
|
| 47 |
+
| VITON-HD | Try-on image generator | Images with 1024 x 768 | [Download](https://drive.google.com/drive/folders/1nsbtVsjC2Y0XEZA9SYYrmI4K3TPr5--p?usp=drive_link) |
|
| 48 |
+
|
| 49 |
+
- AlexNet (LPIPS): [link](https://drive.google.com/file/d/1CJ2HLzlYjp0PXgbeAH90CdJhZbHRVEKN/view?usp=drive_link), we assume that you have downloaded it into `./eval_models/weights/v0.1`.
|
| 50 |
+
|
| 51 |
+
```python
|
| 52 |
+
python3 test_generator.py --occlusion --test_name {test_name} --tocg_checkpoint {condition generator ckpt} --gpu_ids {gpu_ids} --gen_checkpoint {image generator ckpt} --datasetting unpaired --dataroot {dataset_path} --data_list {pair_list_textfile} --composition_mask
|
| 53 |
+
```
|
| 54 |
+
## Training
|
| 55 |
+
|
| 56 |
+
### Try-on condition generator
|
| 57 |
+
|
| 58 |
+
```python
|
| 59 |
+
python3 train_condition.py --gpu_ids {gpu_ids} --Ddownx2 --Ddropout --interflowloss --occlusion --tvlambda_tvob 2.0 --tvlambda_taco 2.0
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
### Try-on image generator
|
| 63 |
+
|
| 64 |
+
```python
|
| 65 |
+
python3 train_generator.py --name test -b 4 -j 8 --gpu_ids {gpu_ids} --fp16 --tocg_checkpoint {condition generator ckpt path} --occlusion --composition_mask
|
| 66 |
+
```
|
| 67 |
+
This stage takes approximately 4 days with two A6000 GPUs.
|
| 68 |
+
|
| 69 |
+
To use "--fp16" option, you should install apex library.
|
| 70 |
+
|
| 71 |
+
## Limitations
|
| 72 |
+
Our work still has several limitations that are not an unique problem of ours in our best knowledge.
|
| 73 |
+
|
| 74 |
+
### Issue #1: crack
|
| 75 |
+
|
| 76 |
+
Several samples have sufferred from a crack artifact.
|
| 77 |
+
In our best knowledge, the crack is amplified due to the up-sizing of last appearance flows (AFs).
|
| 78 |
+
*E.g.*, our network infers the last AFs with 128 x 96 resolution, and then up-scales to 1024 x 768.
|
| 79 |
+
Thereby, the crack regions are extended.
|
| 80 |
+
|
| 81 |
+

|
| 82 |
+
|
| 83 |
+
A slightly reduceable way will be to infer the last AFs with more closer to an image resolution (see "After").
|
| 84 |
+
We provide a checkpoint, where networks infer the AFs with 256 x 192 and an image with 512 x 384 resolution.
|
| 85 |
+
|
| 86 |
+
|Dataset|Network Type|Output Resolution|Google Cloud|
|
| 87 |
+
|--------|--------|--------|-----------|
|
| 88 |
+
| VITON-HD | Try-on condition generator | Appearance flows with 256 x 192 | [Download](https://drive.google.com/drive/folders/1IUJeJQgdwJgoLRZ3v3zlGqKZpVNa-FHM?usp=share_link) |
|
| 89 |
+
| VITON-HD | Try-on image generator | Images with 512 x 384 | [Download](https://drive.google.com/drive/folders/1X4-oAans5bg72aei9rCM0P2tCbFxuB26?usp=share_link) |
|
| 90 |
+
|
| 91 |
+
The corresponding script for inference is as follows:
|
| 92 |
+
```python
|
| 93 |
+
python3 test_generator.py --occlusion --test_name {test_name} --tocg_checkpoint {condition generator ckpt} --gpu_ids {gpu_ids} --gen_checkpoint {image generator ckpt} --datasetting unpaired --dataroot {dataset_path} --data_list {pair_list_textfile} --fine_width 384 --fine_height 512 --num_upsampling_layers more --cond_G_ngf 48 --cond_G_input_width 384 --cond_G_input_height 512 --cond_G_num_layers 6
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
### Issue #2: clothes behind the neck
|
| 97 |
+
Same as other methods, our network cannot fully remove the clothes textures behind the neck.
|
| 98 |
+
Thereby, it remains in the generated samples.
|
| 99 |
+
|
| 100 |
+
A solution would be to mask out such regions when pre-processing the inputs.
|
| 101 |
+
We did not apply such additional technique, since it was not included in a dataset.
|
| 102 |
+
|
| 103 |
+
## Acknowledgments
|
| 104 |
+
|
| 105 |
+
This repository is built based on HR-VITON repository. Thanks for the great work.
|
| 106 |
+
|
| 107 |
+
## Citation
|
| 108 |
+
|
| 109 |
+
If you find this work useful for your research, please cite our paper:
|
| 110 |
+
|
| 111 |
+
```
|
| 112 |
+
@article{shim2023towards,
|
| 113 |
+
title={Towards Squeezing-Averse Virtual Try-On via Sequential Deformation},
|
| 114 |
+
author={Shim, Sang-Heon and Chung, Jiwoo and Heo, Jae-Pil},
|
| 115 |
+
journal={arXiv preprint arXiv:2312.15861},
|
| 116 |
+
year={2023}
|
| 117 |
+
}
|
| 118 |
+
```
|