Add pipeline tag and library name to model card

#1
by nielsr HF staff - opened
Files changed (1) hide show
  1. README.md +144 -3
README.md CHANGED
@@ -1,3 +1,144 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: image-to-image
4
+ library_name: diffusers
5
+ ---
6
+
7
+ # PhotoDoodle
8
+
9
+ > **PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data**
10
+ > <br>
11
+ > [Huang Shijie](https://scholar.google.com/citations?user=HmqYYosAAAAJ),
12
+ > [Yiren Song](https://scholar.google.com.hk/citations?user=L2YS0jgAAAAJ),
13
+ > [Yuxuan Zhang](https://xiaojiu-z.github.io/YuxuanZhang.github.io/),
14
+ > [Hailong Guo](https://github.com/logn-2024),
15
+ > Xueyin Wang,
16
+ > and
17
+ > [Mike Zheng Shou](https://sites.google.com/view/showlab),
18
+ > [Liu Jiaming](https://scholar.google.com/citations?user=SmL7oMQAAAAJ&hl=en)
19
+ > <br>
20
+ > [Show Lab](https://sites.google.com/view/showlab), National University of Singapore
21
+ > <br>
22
+
23
+ <a href="https://arxiv.org/abs/2502.14397"><img src="https://img.shields.io/badge/ariXv-2502.14397-A42C25.svg" alt="arXiv"></a>
24
+ <a href="https://huggingface.co/nicolaus-huang/PhotoDoodle"><img src="https://img.shields.io/badge/🤗_HuggingFace-Model-ffbd45.svg" alt="HuggingFace"></a>
25
+ <a href="https://huggingface.co/datasets/nicolaus-huang/PhotoDoodle/"><img src="https://img.shields.io/badge/🤗_HuggingFace-Dataset-ffbd45.svg" alt="HuggingFace"></a>
26
+
27
+ <br>
28
+
29
+ <img src='./assets/teaser.png' width='100%' />
30
+
31
+
32
+ ## Quick Start
33
+ ### Configuration
34
+ #### 1. **Environment setup**
35
+ ```bash
36
+ git clone git@github.com:showlab/PhotoDoodle.git
37
+ cd PhotoDoodle
38
+
39
+ conda create -n doodle python=3.11.10
40
+ conda activate doodle
41
+ ```
42
+ #### 2. **Requirements installation**
43
+ ```bash
44
+ pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
45
+ pip install --upgrade -r requirements.txt
46
+ ```
47
+
48
+
49
+ ### 2. Inference
50
+ We provided the intergration of diffusers pipeline with our model and uploaded the model weights to huggingface, it's easy to use the our model as example below:
51
+
52
+ ```bash
53
+ from src.pipeline_pe_clone import FluxPipeline
54
+ import torch
55
+ from PIL import Image
56
+
57
+ pretrained_model_name_or_path = "black-forest-labs/FLUX.1-dev"
58
+ pipeline = FluxPipeline.from_pretrained(
59
+ pretrained_model_name_or_path,
60
+ torch_dtype=torch.bfloat16,
61
+ ).to('cuda')
62
+
63
+ pipeline.load_lora_weights("nicolaus-huang/PhotoDoodle", weight_name="pretrain.safetensors")
64
+ pipeline.fuse_lora()
65
+ pipeline.unload_lora_weights()
66
+
67
+ pipeline.load_lora_weights("nicolaus-huang/PhotoDoodle", weight_name="sksmagiceffects.safetensors")
68
+
69
+ height=768
70
+ width=512
71
+
72
+ validation_image = "assets/1.png"
73
+ validation_prompt = "add a halo and wings for the cat by sksmagiceffects"
74
+ condition_image = Image.open(validation_image).resize((height, width)).convert("RGB")
75
+
76
+ result = pipeline(prompt=validation_prompt,
77
+ condition_image=condition_image,
78
+ height=height,
79
+ width=width,
80
+ guidance_scale=3.5,
81
+ num_inference_steps=20,
82
+ max_sequence_length=512).images[0]
83
+
84
+ result.save("output.png")
85
+ ```
86
+
87
+ or simply run the inference script:
88
+ ```
89
+ python inference.py
90
+ ```
91
+
92
+
93
+
94
+ ### 3. Weights
95
+ You can download the trained checkpoints of PhotoDoodle for inference. Below are the details of available models, checkpoint name are also trigger words.
96
+
97
+ You would need to load and fuse the `pretrained ` checkpoints model in order to load the other models.
98
+
99
+ | **Model** | **Description** | **Resolution** |
100
+ | :----------------------------------------------------------: | :---------------------------------------------------------: | :------------: |
101
+ | [pretrained](https://huggingface.co/nicolaus-huang/PhotoDoodle/blob/main/pretrain.safetensors) | PhotoDoodle model trained on `SeedEdit` dataset | 768, 768 |
102
+ | [sksmonstercalledlulu](https://huggingface.co/nicolaus-huang/PhotoDoodle/blob/main/sksmonstercalledlulu.safetensors) | PhotoDoodle model trained on `Cartoon monster` dataset | 768, 512 |
103
+ | [sksmagiceffects](https://huggingface.co/nicolaus-huang/PhotoDoodle/blob/main/sksmagiceffects.safetensors) | PhotoDoodle model trained on `3D effects` dataset | 768, 512 |
104
+ | [skspaintingeffects ](https://huggingface.co/nicolaus-huang/PhotoDoodle/blob/main/skspaintingeffects.safetensors) | PhotoDoodle model trained on `Flowing color blocks` dataset | 768, 512 |
105
+ | [sksedgeeffect ](https://huggingface.co/nicolaus-huang/PhotoDoodle/blob/main/sksedgeeffect.safetensors) | PhotoDoodle model trained on `Hand-drawn outline` dataset | 768, 512 |
106
+
107
+
108
+ ### 4. Dataset
109
+ <span id="dataset_setting"></span>
110
+ #### 2.1 Settings for dataset
111
+ The training process uses a paired dataset stored in a .jsonl file, where each entry contains image file paths and corresponding text descriptions. Each entry includes the source image path, the target (modified) image path, and a caption describing the modification.
112
+
113
+ Example format:
114
+
115
+ ```json
116
+ {"source": "path/to/source.jpg", "target": "path/to/modified.jpg", "caption": "Instruction of modifications"}
117
+ {"source": "path/to/source2.jpg", "target": "path/to/modified2.jpg", "caption": "Another instruction"}
118
+ ```
119
+
120
+ We have uploaded our datasets to [Hugging Face](https://huggingface.co/datasets/nicolaus-huang/PhotoDoodle).
121
+
122
+
123
+ ### 5. Results
124
+
125
+ ![R-F](./assets/R-F.jpg)
126
+
127
+
128
+ ### 6. Acknowledgments
129
+
130
+ 1. Thanks to **[Yuxuan Zhang](https://xiaojiu-z.github.io/YuxuanZhang.github.io/)** and **[Hailong Guo](mailto:guohailong@bupt.edu.cn)** for providing the code base.
131
+ 2. Thanks to **[Diffusers](https://github.com/huggingface/diffusers)** for the open-source project.
132
+
133
+ ## Citation
134
+ ```
135
+ @misc{huang2025photodoodlelearningartisticimage,
136
+ title={PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data},
137
+ author={Shijie Huang and Yiren Song and Yuxuan Zhang and Hailong Guo and Xueyin Wang and Mike Zheng Shou and Jiaming Liu},
138
+ year={2025},
139
+ eprint={2502.14397},
140
+ archivePrefix={arXiv},
141
+ primaryClass={cs.CV},
142
+ url={https://arxiv.org/abs/2502.14397},
143
+ }
144
+ ```