Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,34 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
library_name: diffusers
|
6 |
+
tags:
|
7 |
+
- text-to-image
|
8 |
+
- stable diffusion
|
9 |
+
- personalization
|
10 |
+
- msdiffusion
|
11 |
+
---
|
12 |
+
|
13 |
+
# Introduction
|
14 |
+
|
15 |
+
Our research introduces the MS-Diffusion framework for layout-guided zero-shot image personalization with multi-subjects. This innovative approach integrates grounding tokens with the feature resampler to maintain detail fidelity among subjects. With the layout guidance, MS-Diffusion further improves the cross-attention to adapt to the multi-subject inputs, ensuring that each subject condition acts on specific areas. The proposed multi-subject cross-attention orchestrates harmonious inter-subject compositions while preserving the control of texts.
|
16 |
+
|
17 |
+
![example](imgs/teaser_new.png)
|
18 |
+
|
19 |
+
- **Project Page:** [https://eclipse-t2i.github.io/Lambda-ECLIPSE/](https://eclipse-t2i.github.io/Lambda-ECLIPSE/)
|
20 |
+
- **GitHub:** [https://github.com/Maitreyapatel/lambda-eclipse-inference](https://github.com/Maitreyapatel/lambda-eclipse-inference)
|
21 |
+
- **Paper (arXiv):** [https://arxiv.org/abs/2402.05195](https://arxiv.org/abs/2402.05195)
|
22 |
+
|
23 |
+
# Model
|
24 |
+
|
25 |
+
Download the pretrained base models from [SDXL-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and [CLIP-G]().
|
26 |
+
|
27 |
+
Please refer to our [GitHub repository]() to prepare the environment and get detailed instructions on how to run the model.
|
28 |
+
|
29 |
+
# Important Notes
|
30 |
+
|
31 |
+
- This repo only contains the trained model checkpoint without data, code, or base models. Please check the GitHub repository carefully to get detailed instructions.
|
32 |
+
- The `scale` parameter is used to determine the extent of image control. For default, the `scale` is set to 0.6. In practice, the `scale` of 0.4 would be better if your input contains subjects needing to effect on the whole image, such as the background. **Feel free to adjust the `scale` in your applications.**
|
33 |
+
- The model prefers to need layout inputs. You can use the default layouts in the inference script, while more accurate and realistic layouts generate better results.
|
34 |
+
- Though MS-Diffusion beats SOTA personalized diffusion methods in both single-subject and multi-subject generation, it still suffers from the influence of background in subject images. The best practice is to use masked images since they contain no irrelevant information.
|