Kayson commited on
Commit
dcd7e7c
1 Parent(s): 7ae68fe
Files changed (1) hide show
  1. README.md +0 -121
README.md DELETED
@@ -1,121 +0,0 @@
1
- # InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
2
-
3
- <p align="center">
4
- <a href="https://gengzigang.github.io/instructdiffusion.github.io/">Project Page</a> |
5
- <a href="https://arxiv.org/pdf/2309.03895.pdf">Arxiv</a> |
6
- <a href="https://f605b16c6b183b13ac.gradio.live">Web Demo</a> |
7
- <a href="#QuickStart">QuickStart</a> |
8
- <a href="#Training">Training</a> |
9
- <a href="#Acknowledge">Acknowledge</a> |
10
- <a href='#Citation'>Citation</a>
11
- </p>
12
-
13
- <div align="center">
14
- <img src="figure/teaser.png" width="1000"/>
15
- </div>
16
-
17
- This is the pytorch implementation of InstructDiffusion, a unifying and generic framework for aligning computer vision tasks with human instructions. Our code is based on the [Instruct-pix2pix](https://github.com/timothybrooks/instruct-pix2pix) and [CompVis/stable_diffusion](https://github.com/CompVis/stable-diffusion).<br>
18
-
19
- ## QuickStart
20
- Follow the steps below to quickly edit your own images. The inference code in our repository requires **one GPU with > 9GB memory** to test images with a resolution of **512**.
21
-
22
- 1. Clone this repo.
23
- 2. Setup conda environment:
24
- ```
25
- conda env create -f environment.yaml
26
- conda activate instructdiff
27
- ```
28
- 3. We provide a well-trained [checkpoint](https://mailustceducn-my.sharepoint.com/:u:/g/personal/aa397601_mail_ustc_edu_cn/EZmXduulFidIhJD73SGcbOoBNpm18CJmU4PgPTS21RM2Ow?e=KqQYpO) and a [checkpoint](https://mailustceducn-my.sharepoint.com/:u:/g/personal/aa397601_mail_ustc_edu_cn/EWlNmyeS9P1BkRg_IlXbPbwBeNMQXQTcIA0pCokyd61UWg?e=iKfRdk) that has undergone human-alignment. Feel free to download to the folder `checkpoints` and try both of them.
29
-
30
- 4. You can edit your own images:
31
- ```bash
32
- python edit_cli.py --input example.jpg --edit "Transform it to van Gogh, starry night style."
33
-
34
- # Optionally, you can customize the parameters by using the following syntax:
35
- # --resolution 512 --steps 50 --config configs/instruct_diffusion.yaml --ckpt YOUR_CHECKPOINT --cfg-text 3.5 --cfg-image 1.25
36
-
37
- # We also support loading image from the website and edit, e.g., you could run the command like this:
38
- python edit_cli.py --input "https://wallup.net/wp-content/uploads/2016/01/207131-animals-nature-lion.jpg" \
39
- --edit "Transform it to van Gogh, starry night style." \
40
- --resolution 512 --steps 50 \
41
- --config configs/instruct_diffusion.yaml \
42
- --ckpt checkpoints/v1-5-pruned-emaonly-adaption-task-humanalign.ckpt \
43
- --outdir logs/
44
- ```
45
- For other different tasks, we provide recommended parameter settings, which can be found in [`scripts/inference_example.sh`](./scripts/inference_example.sh).
46
-
47
- 5. (Optional) You can launch your own interactive editing Gradio app:
48
- ```bash
49
- python edit_app.py
50
-
51
- # You can also specify the path to the checkpoint
52
- # The default checkpoint is checkpoints/v1-5-pruned-emaonly-adaption-task-humanalign.ckpt
53
- python edit_app.py --ckpt checkpoints/v1-5-pruned-emaonly-adaption-task-humanalign.ckpt
54
- ```
55
-
56
- ## Training
57
- The code is developed using python 3.8 on Ubuntu 18.04. The code is developed and tested using 48 NVIDIA V100 GPU cards, each with 32GB of memory. Other platforms are not fully tested.
58
-
59
- ### Installation
60
- 1. Clone this repo.
61
- 2. Setup conda environment:
62
- ```
63
- conda env create -f environment.yaml
64
- conda activate instructdiff
65
- ```
66
-
67
- ### Pre-trained Model Preparation
68
- You can use the following command to download the official pre-trained stable diffusion model, or you can download the model trained by our pretraining adaptation process from [OneDrive](https://mailustceducn-my.sharepoint.com/:u:/g/personal/aa397601_mail_ustc_edu_cn/EXJSMIpFev5Nj0kuKI88U1IBZDSjegp3G8ukku0OxRRjFQ?e=QhnnB4) and put it into the following folder: stable_diffusion/models/ldm/stable-diffusion-v1/.
69
- ```
70
- bash scripts/download_pretrained_sd.sh
71
- ```
72
-
73
- ### Data Preparation
74
- You can refer to the [dataset](https://github.com/cientgu/InstructDiffusion/tree/main/dataset) to prepare your data.
75
-
76
- ### Training Command
77
- For multi-GPU training on a single machine, you can use the following command:
78
- ```
79
- python -m torch.distributed.launch --nproc_per_node=8 main.py --name v0 --base configs/instruct_diffusion.yaml --train --logdir logs/instruct_diffusion
80
- ```
81
-
82
- For multi-GPU training on multiple machines, you can use the following command (assuming 6 machines as an example):
83
- ```
84
- bash run_multinode.sh instruct_diffusion v0 6
85
- ```
86
-
87
- ### Convert EMA-Model
88
- You can get the final EMA checkpoint for inference using the command below:
89
- ```
90
- python convert_ckpt.py --ema-ckpt logs/instruct_diffusion/checkpoint/ckpt_epoch_200/state.pth --out-ckpt checkpoints/v1-5-pruned-emaonly-adaption-task.ckpt
91
- ```
92
-
93
- ## Acknowledge
94
-
95
- Thanks to
96
- - [Stable-diffusion](https://github.com/CompVis/stable-diffusion)
97
- - [Instruct-pix2pix](https://github.com/timothybrooks/instruct-pix2pix)
98
-
99
- ## Citation
100
-
101
- ```
102
- @article{Geng23instructdiff,
103
- author = {Zigang Geng and
104
- Binxin Yang and
105
- Tiankai Hang and
106
- Chen Li and
107
- Shuyang Gu and
108
- Ting Zhang and
109
- Jianmin Bao and
110
- Zheng Zhang and
111
- Han Hu and
112
- Dong Chen and
113
- Baining Guo},
114
- title = {InstructDiffusion: {A} Generalist Modeling Interface for Vision Tasks},
115
- journal = {CoRR},
116
- volume = {abs/2309.03895},
117
- year = {2023},
118
- url = {https://doi.org/10.48550/arXiv.2309.03895},
119
- doi = {10.48550/arXiv.2309.03895},
120
- }
121
- ```