Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,53 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
---
|
4 |
+
---
|
5 |
+
license: apache-2.0
|
6 |
+
---
|
7 |
+
# AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation
|
8 |
+
[Yanan Sun](https://scholar.google.com/citations?user=6TA1oPkAAAAJ&hl=en), Yanchen Liu, Yinhao Tang, [Wenjie Pei](https://wenjiepei.github.io/) and [Kai Chen*](https://chenkai.site/)
|
9 |
+
|
10 |
+
**Shanghai AI Laboratory**
|
11 |
+
|
12 |
+
![](./assets/teaser.png "AnyControl")
|
13 |
+
|
14 |
+
|
15 |
+
## Overview
|
16 |
+
The field of text-to-image (T2I) generation has made significant progress in recent years,
|
17 |
+
largely driven by advancements in diffusion models.
|
18 |
+
Linguistic control enables effective content creation, but struggles with fine-grained control over image generation.
|
19 |
+
This challenge has been explored, to a great extent, by incorporating additional usersupplied spatial conditions,
|
20 |
+
such as depth maps and edge maps, into pre-trained T2I models through extra encoding.
|
21 |
+
However, multi-control image synthesis still faces several challenges.
|
22 |
+
Specifically, current approaches are limited in handling free combinations of diverse input control signals,
|
23 |
+
overlook the complex relationships among multiple spatial conditions, and often fail to maintain semantic alignment with provided textual prompts.
|
24 |
+
This can lead to suboptimal user experiences. To address these challenges, we propose AnyControl,
|
25 |
+
a multi-control image synthesis framework that supports arbitrary combinations of diverse control signals.
|
26 |
+
AnyControl develops a novel Multi-Control Encoder that extracts a unified multi-modal embedding to guide the generation process.
|
27 |
+
This approach enables a holistic understanding of user inputs, and produces high-quality,
|
28 |
+
faithful results under versatile control signals, as demonstrated by extensive quantitative and qualitative evaluations.
|
29 |
+
|
30 |
+
|
31 |
+
## Model Card
|
32 |
+
AnyControl for SD 1.5
|
33 |
+
- `ckpts/anycontrol_15.ckpt`: weights for AnyControl.
|
34 |
+
- `ckpts/init_local.ckpt`: initial weights of AnyControl during training, generated following [Uni-ControlNet](https://github.com/ShihaoZhaoZSH/Uni-ControlNet).
|
35 |
+
- `ckpts/blip2_pretrained.pth`: third-party model.
|
36 |
+
- `annotator/ckpts`: third-party models used in annotators.
|
37 |
+
|
38 |
+
|
39 |
+
## License and Citation
|
40 |
+
|
41 |
+
All models and assets are under the [Apache 2.0 license](./LICENSE) unless specified otherwise.
|
42 |
+
|
43 |
+
If this work is helpful for your research, please consider citing the following BibTeX entry.
|
44 |
+
|
45 |
+
``` bibtex
|
46 |
+
@misc{sun2024anycontrol,
|
47 |
+
title={AnyControl: Create your artwork with versatile control on text-to-image generation},
|
48 |
+
author={Sun, Yanan and Liu, Yanchen and Tang, Yinhao and Pei, Wenjie and Chen, Kai},
|
49 |
+
booktitle={ECCV},
|
50 |
+
year={2024}
|
51 |
+
|
52 |
+
}
|
53 |
+
```
|