Update
Browse files
README.md
CHANGED
@@ -1,3 +1,114 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
language:
|
4 |
+
- ak
|
5 |
+
library_name: diffusers
|
6 |
---
|
7 |
+
<p align="center">
|
8 |
+
<img src="https://github.com/JackAILab/ConsistentID/assets/135965025/c0594480-d73d-4268-95ca-5494ca2a61e4" height=100>
|
9 |
+
|
10 |
+
</p>
|
11 |
+
|
12 |
+
<!-- ## <div align="center"><b>ConsistentID</b></div> -->
|
13 |
+
|
14 |
+
<div align="center">
|
15 |
+
|
16 |
+
## ConsistentID : Portrait Generation with Multimodal Fine-Grained Identity Preserving [![Paper page](https://huggingface.co/datasets/huggingface/badges/resolve/main/paper-page-md-dark.svg)]()
|
17 |
+
[π[Paper](https://arxiv.org/abs/2404.16771)]   [π©[Project Page](https://ssugarwh.github.io/consistentid.github.io/)]   [πΌ[Gradio Demo](http://consistentid.natapp1.cc/)] <br>
|
18 |
+
|
19 |
+
|
20 |
+
</div>
|
21 |
+
|
22 |
+
### π **Key Features:**
|
23 |
+
|
24 |
+
1. Portrait generation with extremely high **ID fidelity**, without sacrificing diversity, text controllability.
|
25 |
+
2. Introducing **FaceParsing** and **FaceID** information into the Diffusion model.
|
26 |
+
3. Rapid customization **within seconds**, with no additional LoRA training.
|
27 |
+
4. Can serve as an **Adapter** to collaborate with other Base Models alongside LoRA modules in community.
|
28 |
+
|
29 |
+
---
|
30 |
+
## π₯ **Examples**
|
31 |
+
|
32 |
+
<p align="center">
|
33 |
+
|
34 |
+
<img src="https://github.com/JackAILab/ConsistentID/assets/135965025/f949a03d-bed2-4839-a995-7b451d8c981b" height=450>
|
35 |
+
|
36 |
+
|
37 |
+
</p>
|
38 |
+
|
39 |
+
|
40 |
+
## π© To-Do List
|
41 |
+
Your star will help facilitate the process.
|
42 |
+
- [x] Release training, evaluation code, and demo!
|
43 |
+
- [ ] Retrain with more data and the SDXL base model to enhance aesthetics and generalization.
|
44 |
+
- [ ] Release a multi-ID input version to guide the improvement of ID diversity.
|
45 |
+
- [ ] Optimize training and inference structures to further improve text following and ID decoupling capabilities.
|
46 |
+
|
47 |
+
## π·οΈ Abstract
|
48 |
+
|
49 |
+
This is a work in the field of AIGC that introduces FaceParsing information and FaceID information into the Diffusion model. Previous work mainly focused on overall ID preservation, even though fine-grained ID preservation models such as InstantID have recently been proposed, the injection of facial ID features will be fixed. In order to achieve more flexible consistency maintenance of fine-grained IDs for facial features, a batch of 50000 multimodal fine-grained ID datasets was reconstructed for training the proposed FacialEncoder model, which can support common functions such as personalized photos, gender/age changes, and identity confusion.
|
50 |
+
|
51 |
+
At the same time, we have defined a unified measurement benchmark FGIS for Fine-Grained Identity Preservice, covering several common facial personalized character scenes and characters, and constructed a fine-grained ID preservation model baseline.
|
52 |
+
|
53 |
+
Finally, a large number of experiments were conducted in this article, and ConsistentID achieved the effect of SOTA in facial personalization task processing. It was verified that ConsistentID can improve ID consistency and even modify facial features by selecting finer-grained prompts, which opens up a direction for future research on Fine-Grained facial personalization.
|
54 |
+
|
55 |
+
|
56 |
+
## π§ Requirements
|
57 |
+
|
58 |
+
To install requirements:
|
59 |
+
|
60 |
+
```setup
|
61 |
+
pip3 install -r requirements.txt
|
62 |
+
```
|
63 |
+
|
64 |
+
## π¦οΈ Data Preparation
|
65 |
+
|
66 |
+
Prepare Data in the following format
|
67 |
+
|
68 |
+
βββ data
|
69 |
+
| βββ JSON_all.json
|
70 |
+
| βββ resize_IMG # Imgaes
|
71 |
+
| βββ all_faceID # FaceID
|
72 |
+
| βββ parsing_mask_IMG # Parsing Mask
|
73 |
+
|
74 |
+
The .json file should be like
|
75 |
+
```
|
76 |
+
[
|
77 |
+
{
|
78 |
+
"resize_IMG": "Path to resized image...",
|
79 |
+
"parsing_color_IMG": "...",
|
80 |
+
"parsing_mask_IMG": "...",
|
81 |
+
"vqa_llva": "...",
|
82 |
+
"id_embed_file_resize": "...",
|
83 |
+
"vqa_llva_more_face_detail": "..."
|
84 |
+
},
|
85 |
+
...
|
86 |
+
]
|
87 |
+
```
|
88 |
+
|
89 |
+
## π Train
|
90 |
+
Ensure that the workspace is the root directory of the project.
|
91 |
+
|
92 |
+
```setup
|
93 |
+
bash train_bash.sh
|
94 |
+
```
|
95 |
+
|
96 |
+
## π§ͺ Infer
|
97 |
+
Ensure that the workspace is the root directory of the project.
|
98 |
+
|
99 |
+
```setup
|
100 |
+
python infer.py
|
101 |
+
```
|
102 |
+
|
103 |
+
## β¬ Model weights
|
104 |
+
We are hosting the model weights on **huggingface** to achieve a faster and more stable demo experience, so stay tuned ~
|
105 |
+
|
106 |
+
The pre-trained model parameters of the model can now be downloaded on [Google Drive](https://drive.google.com/file/d/1jCHICryESmNkzGi8J_FlY3PjJz9gqoSI/view?usp=drive_link) or [Baidu Netdisk](https://pan.baidu.com/s/1NAVmH8S7Ls5rZc-snDk1Ng?pwd=nsh6).
|
107 |
+
|
108 |
+
|
109 |
+
|
110 |
+
|
111 |
+
|
112 |
+
---
|
113 |
+
license: mit
|
114 |
+
---
|