fruhhfft commited on
Commit
3c258fa
1 Parent(s): 102ecc6

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +324 -0
README.md ADDED
@@ -0,0 +1,324 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - text-to-image
4
+ - stable-diffusion
5
+
6
+ language:
7
+ - en
8
+ library_name: diffusers
9
+ ---
10
+
11
+ # IP-Adapter-FaceID Model Card
12
+
13
+
14
+ <div align="center">
15
+
16
+ [**Project Page**](https://ip-adapter.github.io) **|** [**Paper (ArXiv)**](https://arxiv.org/abs/2308.06721) **|** [**Code**](https://github.com/tencent-ailab/IP-Adapter)
17
+ </div>
18
+
19
+ ---
20
+
21
+
22
+
23
+ ## Introduction
24
+
25
+ An experimental version of IP-Adapter-FaceID: we use face ID embedding from a face recognition model instead of CLIP image embedding, additionally, we use LoRA to improve ID consistency. IP-Adapter-FaceID can generate various style images conditioned on a face with only text prompts.
26
+
27
+ ![results](./ip-adapter-faceid.jpg)
28
+
29
+
30
+ **Update 2023/12/27**:
31
+
32
+ IP-Adapter-FaceID-Plus: face ID embedding (for face ID) + CLIP image embedding (for face structure)
33
+
34
+ <div align="center">
35
+
36
+ ![results](./faceid-plus.jpg)
37
+ </div>
38
+
39
+ **Update 2023/12/28**:
40
+
41
+ IP-Adapter-FaceID-PlusV2: face ID embedding (for face ID) + controllable CLIP image embedding (for face structure)
42
+
43
+ You can adjust the weight of the face structure to get different generation!
44
+
45
+ <div align="center">
46
+
47
+ ![results](./faceid_plusv2.jpg)
48
+ </div>
49
+
50
+ **Update 2024/01/04**:
51
+
52
+ IP-Adapter-FaceID-SDXL: An experimental SDXL version of IP-Adapter-FaceID
53
+
54
+ <div align="center">
55
+
56
+ ![results](./sdxl_faceid.jpg)
57
+ </div>
58
+
59
+ ## Usage
60
+
61
+ ### IP-Adapter-FaceID
62
+
63
+ Firstly, you should use [insightface](https://github.com/deepinsight/insightface) to extract face ID embedding:
64
+
65
+ ```python
66
+
67
+ import cv2
68
+ from insightface.app import FaceAnalysis
69
+ import torch
70
+
71
+ app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
72
+ app.prepare(ctx_id=0, det_size=(640, 640))
73
+
74
+ image = cv2.imread("person.jpg")
75
+ faces = app.get(image)
76
+
77
+ faceid_embeds = torch.from_numpy(faces[0].normed_embedding).unsqueeze(0)
78
+ ```
79
+
80
+ Then, you can generate images conditioned on the face embeddings:
81
+
82
+ ```python
83
+
84
+ import torch
85
+ from diffusers import StableDiffusionPipeline, DDIMScheduler, AutoencoderKL
86
+ from PIL import Image
87
+
88
+ from ip_adapter.ip_adapter_faceid import IPAdapterFaceID
89
+
90
+ base_model_path = "SG161222/Realistic_Vision_V4.0_noVAE"
91
+ vae_model_path = "stabilityai/sd-vae-ft-mse"
92
+ ip_ckpt = "ip-adapter-faceid_sd15.bin"
93
+ device = "cuda"
94
+
95
+ noise_scheduler = DDIMScheduler(
96
+ num_train_timesteps=1000,
97
+ beta_start=0.00085,
98
+ beta_end=0.012,
99
+ beta_schedule="scaled_linear",
100
+ clip_sample=False,
101
+ set_alpha_to_one=False,
102
+ steps_offset=1,
103
+ )
104
+ vae = AutoencoderKL.from_pretrained(vae_model_path).to(dtype=torch.float16)
105
+ pipe = StableDiffusionPipeline.from_pretrained(
106
+ base_model_path,
107
+ torch_dtype=torch.float16,
108
+ scheduler=noise_scheduler,
109
+ vae=vae,
110
+ feature_extractor=None,
111
+ safety_checker=None
112
+ )
113
+
114
+ # load ip-adapter
115
+ ip_model = IPAdapterFaceID(pipe, ip_ckpt, device)
116
+
117
+ # generate image
118
+ prompt = "photo of a woman in red dress in a garden"
119
+ negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality, blurry"
120
+
121
+ images = ip_model.generate(
122
+ prompt=prompt, negative_prompt=negative_prompt, faceid_embeds=faceid_embeds, num_samples=4, width=512, height=768, num_inference_steps=30, seed=2023
123
+ )
124
+
125
+ ```
126
+
127
+ you can also use a normal IP-Adapter and a normal LoRA to load model:
128
+
129
+ ```python
130
+ import torch
131
+ from diffusers import StableDiffusionPipeline, DDIMScheduler, AutoencoderKL
132
+ from PIL import Image
133
+
134
+ from ip_adapter.ip_adapter_faceid_separate import IPAdapterFaceID
135
+
136
+ base_model_path = "SG161222/Realistic_Vision_V4.0_noVAE"
137
+ vae_model_path = "stabilityai/sd-vae-ft-mse"
138
+ ip_ckpt = "ip-adapter-faceid_sd15.bin"
139
+ lora_ckpt = "ip-adapter-faceid_sd15_lora.safetensors"
140
+ device = "cuda"
141
+
142
+ noise_scheduler = DDIMScheduler(
143
+ num_train_timesteps=1000,
144
+ beta_start=0.00085,
145
+ beta_end=0.012,
146
+ beta_schedule="scaled_linear",
147
+ clip_sample=False,
148
+ set_alpha_to_one=False,
149
+ steps_offset=1,
150
+ )
151
+ vae = AutoencoderKL.from_pretrained(vae_model_path).to(dtype=torch.float16)
152
+ pipe = StableDiffusionPipeline.from_pretrained(
153
+ base_model_path,
154
+ torch_dtype=torch.float16,
155
+ scheduler=noise_scheduler,
156
+ vae=vae,
157
+ feature_extractor=None,
158
+ safety_checker=None
159
+ )
160
+
161
+ # load lora and fuse
162
+ pipe.load_lora_weights(lora_ckpt)
163
+ pipe.fuse_lora()
164
+
165
+ # load ip-adapter
166
+ ip_model = IPAdapterFaceID(pipe, ip_ckpt, device)
167
+
168
+ # generate image
169
+ prompt = "photo of a woman in red dress in a garden"
170
+ negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality, blurry"
171
+
172
+ images = ip_model.generate(
173
+ prompt=prompt, negative_prompt=negative_prompt, faceid_embeds=faceid_embeds, num_samples=4, width=512, height=768, num_inference_steps=30, seed=2023
174
+ )
175
+
176
+
177
+ ```
178
+
179
+ ### IP-Adapter-FaceID-SDXL
180
+
181
+ Firstly, you should use [insightface](https://github.com/deepinsight/insightface) to extract face ID embedding:
182
+
183
+ ```python
184
+
185
+ import cv2
186
+ from insightface.app import FaceAnalysis
187
+ import torch
188
+
189
+ app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
190
+ app.prepare(ctx_id=0, det_size=(640, 640))
191
+
192
+ image = cv2.imread("person.jpg")
193
+ faces = app.get(image)
194
+
195
+ faceid_embeds = torch.from_numpy(faces[0].normed_embedding).unsqueeze(0)
196
+ ```
197
+
198
+ Then, you can generate images conditioned on the face embeddings:
199
+
200
+ ```python
201
+
202
+ import torch
203
+ from diffusers import StableDiffusionXLPipeline, DDIMScheduler
204
+ from PIL import Image
205
+
206
+ from ip_adapter.ip_adapter_faceid import IPAdapterFaceIDXL
207
+
208
+ base_model_path = "SG161222/RealVisXL_V3.0"
209
+ ip_ckpt = "ip-adapter-faceid_sdxl.bin"
210
+ device = "cuda"
211
+
212
+ noise_scheduler = DDIMScheduler(
213
+ num_train_timesteps=1000,
214
+ beta_start=0.00085,
215
+ beta_end=0.012,
216
+ beta_schedule="scaled_linear",
217
+ clip_sample=False,
218
+ set_alpha_to_one=False,
219
+ steps_offset=1,
220
+ )
221
+ pipe = StableDiffusionXLPipeline.from_pretrained(
222
+ base_model_path,
223
+ torch_dtype=torch.float16,
224
+ scheduler=noise_scheduler,
225
+ add_watermarker=False,
226
+ )
227
+
228
+ # load ip-adapter
229
+ ip_model = IPAdapterFaceIDXL(pipe, ip_ckpt, device)
230
+
231
+ # generate image
232
+ prompt = "A closeup shot of a beautiful Asian teenage girl in a white dress wearing small silver earrings in the garden, under the soft morning light"
233
+ negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality, blurry"
234
+
235
+ images = ip_model.generate(
236
+ prompt=prompt, negative_prompt=negative_prompt, faceid_embeds=faceid_embeds, num_samples=2,
237
+ width=1024, height=1024,
238
+ num_inference_steps=30, guidance_scale=7.5, seed=2023
239
+ )
240
+
241
+ ```
242
+
243
+
244
+ ### IP-Adapter-FaceID-Plus
245
+
246
+ Firstly, you should use [insightface](https://github.com/deepinsight/insightface) to extract face ID embedding and face image:
247
+
248
+ ```python
249
+
250
+ import cv2
251
+ from insightface.app import FaceAnalysis
252
+ from insightface.utils import face_align
253
+ import torch
254
+
255
+ app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
256
+ app.prepare(ctx_id=0, det_size=(640, 640))
257
+
258
+ image = cv2.imread("person.jpg")
259
+ faces = app.get(image)
260
+
261
+ faceid_embeds = torch.from_numpy(faces[0].normed_embedding).unsqueeze(0)
262
+ face_image = face_align.norm_crop(image, landmark=faces[0].kps, image_size=224) # you can also segment the face
263
+ ```
264
+
265
+ Then, you can generate images conditioned on the face embeddings:
266
+
267
+ ```python
268
+
269
+ import torch
270
+ from diffusers import StableDiffusionPipeline, DDIMScheduler, AutoencoderKL
271
+ from PIL import Image
272
+
273
+ from ip_adapter.ip_adapter_faceid import IPAdapterFaceIDPlus
274
+
275
+ v2 = False
276
+ base_model_path = "SG161222/Realistic_Vision_V4.0_noVAE"
277
+ vae_model_path = "stabilityai/sd-vae-ft-mse"
278
+ image_encoder_path = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K"
279
+ ip_ckpt = "ip-adapter-faceid-plus_sd15.bin" if not v2 else "ip-adapter-faceid-plusv2_sd15.bin"
280
+ device = "cuda"
281
+
282
+ noise_scheduler = DDIMScheduler(
283
+ num_train_timesteps=1000,
284
+ beta_start=0.00085,
285
+ beta_end=0.012,
286
+ beta_schedule="scaled_linear",
287
+ clip_sample=False,
288
+ set_alpha_to_one=False,
289
+ steps_offset=1,
290
+ )
291
+ vae = AutoencoderKL.from_pretrained(vae_model_path).to(dtype=torch.float16)
292
+ pipe = StableDiffusionPipeline.from_pretrained(
293
+ base_model_path,
294
+ torch_dtype=torch.float16,
295
+ scheduler=noise_scheduler,
296
+ vae=vae,
297
+ feature_extractor=None,
298
+ safety_checker=None
299
+ )
300
+
301
+ # load ip-adapter
302
+ ip_model = IPAdapterFaceIDPlus(pipe, image_encoder_path, ip_ckpt, device)
303
+
304
+ # generate image
305
+ prompt = "photo of a woman in red dress in a garden"
306
+ negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality, blurry"
307
+
308
+ images = ip_model.generate(
309
+ prompt=prompt, negative_prompt=negative_prompt, face_image=face_image, faceid_embeds=faceid_embeds, shortcut=v2, s_scale=1.0,
310
+ num_samples=4, width=512, height=768, num_inference_steps=30, seed=2023
311
+ )
312
+
313
+ ```
314
+
315
+
316
+ ## Limitations and Bias
317
+ - The model does not achieve perfect photorealism and ID consistency.
318
+ - The generalization of the model is limited due to limitations of the training data, base model and face recognition model.
319
+
320
+
321
+
322
+ ## Non-commercial use
323
+ **This model is released exclusively for research purposes and is not intended for commercial use.**
324
+