zhendongw commited on
Commit
83fc1e6
·
1 Parent(s): 347de42

Upload 9 files

Browse files
.gitattributes CHANGED
@@ -32,3 +32,10 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
35
+ assets/edit_results.png filter=lfs diff=lfs merge=lfs -text
36
+ assets/generalization_results.png filter=lfs diff=lfs merge=lfs -text
37
+ assets/more_example_depth.png filter=lfs diff=lfs merge=lfs -text
38
+ assets/more_example_hed.png filter=lfs diff=lfs merge=lfs -text
39
+ assets/more_example_seg.png filter=lfs diff=lfs merge=lfs -text
40
+ assets/multi_task_results.png filter=lfs diff=lfs merge=lfs -text
41
+ assets/teaser_img.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,9 +1,89 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
 
5
  **In-Context Learning Unlocked for Diffusion Models**<br>
6
  Zhendong Wang, Yifan Jiang, Yadong Lu, Yelong Shen, Pengcheng He, Weizhu Chen, Zhangyang Wang and Mingyuan Zhou <br>
7
 
8
- We provide the pre-trained model checkpints here and more details about running can be found in our [Github Page](https://github.com/Zhendong-Wang/Prompt-Diffusion).
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Prompt-Diffusion: In-Context Learning Unlocked for Diffusion Models
2
+ ### [Project Page](https://zhendong-wang.github.io/prompt-diffusion.github.io/) | [Paper](https://arxiv.org/abs/2305.01115)
3
+ ![Illustration](./assets/teaser_img.png)
4
 
5
  **In-Context Learning Unlocked for Diffusion Models**<br>
6
  Zhendong Wang, Yifan Jiang, Yadong Lu, Yelong Shen, Pengcheng He, Weizhu Chen, Zhangyang Wang and Mingyuan Zhou <br>
7
 
8
+ [//]: # (https://arxiv.org/abs/2206.02262 <br>)
9
 
10
+ Abstract: *We present Prompt Diffusion, a framework for enabling in-context learning in diffusion-based generative models.
11
+ Given a pair of task-specific example images, such as depth from/to image and scribble from/to image, and a text guidance,
12
+ our model automatically understands the underlying task and performs the same task on a new query image following the text guidance.
13
+ To achieve this, we propose a vision-language prompt that can model a wide range of vision-language tasks and a diffusion model that takes it as input.
14
+ The diffusion model is trained jointly on six different tasks using these prompts.
15
+ The resulting Prompt Diffusion model becomes the first diffusion-based vision-language foundation model capable of in-context learning.
16
+ It demonstrates high-quality in-context generation for the trained tasks and effectively generalizes to new, unseen vision tasks using their respective prompts.
17
+ Our model also shows compelling text-guided image editing results. Our framework aims to facilitate research into in-context learning for computer vision, with code publicly available here.*
18
+
19
+ ![Illustration](./assets/illustration.png)
20
+
21
+ ## ToDos
22
+ - [x] Release pretrained models
23
+ - [x] Release play-around codes
24
+
25
+
26
+ ## Results
27
+ ### Multi-Task Learning
28
+
29
+ ![Illustration](./assets/multi_task_results.png)
30
+
31
+ ### Generalization to New Tasks
32
+
33
+ ![Illustration](./assets/generalization_results.png)
34
+
35
+ ### Image Editing Ability
36
+
37
+ ![Illustration](./assets/edit_results.png)
38
+
39
+ ## Train Prompt Diffusion
40
+
41
+ ### Prepare Dataset
42
+
43
+ We use the public dataset proposed by [InstructPix2Pix](https://github.com/timothybrooks/instruct-pix2pix) as our base dataset,
44
+ which consists of around 310k image-caption pairs. Furthermore, we apply the [ControlNet](https://github.com/lllyasviel/ControlNet) annotators
45
+ to collect image conditions such as HED/Depth/Segmentation maps of images. The code for collecting image conditions is provided in `annotate_data.py`.
46
+
47
+ ### Training
48
+
49
+ Training a Prompt Diffusion is as easy as follows,
50
+
51
+ ```.bash
52
+ python tool_add_control.py 'path to your stable diffusion checkpoint, e.g., /.../v1-5-pruned-emaonly.ckpt' ./models/control_sd15_ini.ckpt
53
+
54
+ python train.py --name 'experiment name' --gpus=8 --num_nodes=1 \
55
+ --logdir 'your logdir path' \
56
+ --data_config './models/dataset.yaml' --base './models/cldm_v15.yaml' \
57
+ --sd_locked
58
+ ```
59
+
60
+ We also provide the job script in `scripts/train_v1-5.sh` for an easy run.
61
+
62
+ ## Run Prompt Diffusion from our checkpoints
63
+
64
+ We will update the code for playing Prompt Diffusion and the model checkpoints soon.
65
+
66
+ ## More Examples
67
+
68
+ ![Illustration](./assets/more_example_depth.png)
69
+ ![Illustration](./assets/more_example_hed.png)
70
+ ![Illustration](./assets/more_example_seg.png)
71
+
72
+
73
+ ## Citation
74
+
75
+
76
+ ```
77
+ @article{wang2023promptdiffusion,
78
+ title = {In-Context Learning Unlocked for Diffusion Models},
79
+ author = {Wang, Zhendong and Jiang, Yifan and Lu, Yadong and Shen, Yelong and He, Pengcheng and Chen, Weizhu and Wang, Zhangyang and Zhou, Mingyuan},
80
+ journal = {arXiv preprint arXiv:2305.01115},
81
+ year = {2023},
82
+ url = {https://arxiv.org/abs/2305.01115}
83
+ }
84
+ ```
85
+
86
+ ## Acknowledgements
87
+ We thank [Brooks et al.](https://github.com/timothybrooks/instruct-pix2pix) for sharing the dataset for finetuning Stable Diffusion.
88
+ We also thank [Lvmin Zhang and Maneesh Agrawala
89
+ ](https://github.com/lllyasviel/ControlNet) for providing the awesome code base ControlNet.
assets/edit_results.png ADDED

Git LFS Details

  • SHA256: 5dc605acdde3ba315e2716488159da25349aaefb8bb1f0f6ebcc64bda0d23d6e
  • Pointer size: 132 Bytes
  • Size of remote file: 2.11 MB
assets/generalization_results.png ADDED

Git LFS Details

  • SHA256: c69ee0a9b8448852f24aef3653f735fe960d63483ba4e5295ad86f5a74f115b7
  • Pointer size: 132 Bytes
  • Size of remote file: 1.73 MB
assets/illustration.png ADDED
assets/more_example_depth.png ADDED

Git LFS Details

  • SHA256: 2ac5ccf72797e4c0c526dca2e38e1eb55325d08e6938c41bff25fc7df0fc4820
  • Pointer size: 132 Bytes
  • Size of remote file: 5.18 MB
assets/more_example_hed.png ADDED

Git LFS Details

  • SHA256: 99e2e799bd91a890ce252e818247caa623c7c1cc7250111ac6635cc1c5755a56
  • Pointer size: 132 Bytes
  • Size of remote file: 5.33 MB
assets/more_example_seg.png ADDED

Git LFS Details

  • SHA256: cb117cebe0d16a5f5d5f7c3751cc709f868ba00f70ab6eb65b432c2b8555d1ef
  • Pointer size: 132 Bytes
  • Size of remote file: 5.11 MB
assets/multi_task_results.png ADDED

Git LFS Details

  • SHA256: ca5bc344572b0a70daec9e44b82bdd3907a441142702316bdde6ff675823c29a
  • Pointer size: 132 Bytes
  • Size of remote file: 3.43 MB
assets/teaser_img.png ADDED

Git LFS Details

  • SHA256: 4ea8da7bb50db2fee3c98e1aedf1bc0691fe77cc8fbeb85f251535306d9baa0d
  • Pointer size: 132 Bytes
  • Size of remote file: 2.61 MB