Ubaida10
/

SD-VITON

Model card Files Files and versions

xet

Community

Ubaida10 commited on May 20

Commit

4426e7f

verified ·

1 Parent(s): 75d9b97

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +118 -0

README.md ADDED Viewed

	@@ -0,0 +1,118 @@

+# SD-VITON-Virtual-Try-On
+This is the official repository for the following paper:
+> **Towards Squeezing-Averse Virtual Try-On via Sequential Deformation** [[arxiv]](https://arxiv.org/pdf/2312.15861.pdf)
+>
+> Sang-Heon Shim, Jiwoo Chung, Jae-Pil Heo
+> Accepted by **AAAI 2024**.
+![teaser](assets/teaser.png)&nbsp;
+## Notice
+This repository is currently built only for sharing the source code of an academic research paper.
+It has several limitations. Please check out them at below.
+## News
+- *2024-01-31* We have released the source codes and checkpoints.
+## Installation
+Clone this repository:
+```
+git clone https://github.com/SHShim0513/SD-VITON.git
+cd ./SD-VITON/
+```
+Install PyTorch and other dependencies:
+```
+conda create -n {env_name} python=3.8
+conda activate {env_name}
+conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch-lts -c nvidia
+pip install opencv-python torchgeometry Pillow tqdm tensorboardX scikit-image scipy timm==0.4.12
+```
+## Dataset
+We train and evaluate our model using the dataset from the following [link](https://github.com/shadow2496/VITON-HD).
+We assume that you have downloaded it into `./data`.
+## Inference
+Here are the download links for each model checkpoint:
+|Dataset|Network Type|Output Resolution|Google Cloud|
+|--------|--------|--------|-----------|
+| VITON-HD | Try-on condition generator | Appearance flows with 128 x 96 | [Download](https://drive.google.com/drive/folders/1sqKNvyTsF8HGAv72wV2nLIeZmYA1Za9V?usp=drive_link) |
+| VITON-HD | Try-on image generator | Images with 1024 x 768 | [Download](https://drive.google.com/drive/folders/1nsbtVsjC2Y0XEZA9SYYrmI4K3TPr5--p?usp=drive_link) |
+- AlexNet (LPIPS): [link](https://drive.google.com/file/d/1CJ2HLzlYjp0PXgbeAH90CdJhZbHRVEKN/view?usp=drive_link), we assume that you have downloaded it into `./eval_models/weights/v0.1`.
+```python
+python3 test_generator.py --occlusion --test_name {test_name} --tocg_checkpoint {condition generator ckpt} --gpu_ids {gpu_ids} --gen_checkpoint {image generator ckpt} --datasetting unpaired --dataroot {dataset_path} --data_list {pair_list_textfile} --composition_mask
+```
+## Training
+### Try-on condition generator
+```python
+python3 train_condition.py --gpu_ids {gpu_ids} --Ddownx2 --Ddropout --interflowloss --occlusion --tvlambda_tvob 2.0 --tvlambda_taco 2.0
+```
+### Try-on image generator
+```python
+python3 train_generator.py --name test -b 4 -j 8 --gpu_ids {gpu_ids} --fp16 --tocg_checkpoint {condition generator ckpt path} --occlusion --composition_mask
+```
+This stage takes approximately 4 days with two A6000 GPUs.
+To use "--fp16" option, you should install apex library.
+## Limitations
+Our work still has several limitations that are not an unique problem of ours in our best knowledge.
+### Issue #1: crack
+Several samples have sufferred from a crack artifact.
+In our best knowledge, the crack is amplified due to the up-sizing of last appearance flows (AFs).
+*E.g.*, our network infers the last AFs with 128 x 96 resolution, and then up-scales to 1024 x 768.
+Thereby, the crack regions are extended.
+![teaser](assets/fig_limitation.jpg)&nbsp;
+A slightly reduceable way will be to infer the last AFs with more closer to an image resolution (see "After").
+We provide a checkpoint, where networks infer the AFs with 256 x 192 and an image with 512 x 384 resolution.
+|Dataset|Network Type|Output Resolution|Google Cloud|
+|--------|--------|--------|-----------|
+| VITON-HD | Try-on condition generator | Appearance flows with 256 x 192 | [Download](https://drive.google.com/drive/folders/1IUJeJQgdwJgoLRZ3v3zlGqKZpVNa-FHM?usp=share_link) |
+| VITON-HD | Try-on image generator | Images with 512 x 384 | [Download](https://drive.google.com/drive/folders/1X4-oAans5bg72aei9rCM0P2tCbFxuB26?usp=share_link) |
+The corresponding script for inference is as follows:
+```python
+python3 test_generator.py --occlusion --test_name {test_name} --tocg_checkpoint {condition generator ckpt} --gpu_ids {gpu_ids} --gen_checkpoint {image generator ckpt} --datasetting unpaired --dataroot {dataset_path} --data_list {pair_list_textfile} --fine_width 384 --fine_height 512 --num_upsampling_layers more --cond_G_ngf 48 --cond_G_input_width 384 --cond_G_input_height 512 --cond_G_num_layers 6
+```
+### Issue #2: clothes behind the neck
+Same as other methods, our network cannot fully remove the clothes textures behind the neck.
+Thereby, it remains in the generated samples.
+A solution would be to mask out such regions when pre-processing the inputs.
+We did not apply such additional technique, since it was not included in a dataset.
+## Acknowledgments
+This repository is built based on HR-VITON repository. Thanks for the great work.
+## Citation
+If you find this work useful for your research, please cite our paper:
+```
+@article{shim2023towards,
+  title={Towards Squeezing-Averse Virtual Try-On via Sequential Deformation},
+  author={Shim, Sang-Heon and Chung, Jiwoo and Heo, Jae-Pil},
+  journal={arXiv preprint arXiv:2312.15861},
+  year={2023}
+}
+```