nowsyn commited on
Commit
d683f88
1 Parent(s): 17e23f2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -3
README.md CHANGED
@@ -1,3 +1,53 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ ---
5
+ license: apache-2.0
6
+ ---
7
+ # AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation
8
+ [Yanan Sun](https://scholar.google.com/citations?user=6TA1oPkAAAAJ&hl=en), Yanchen Liu, Yinhao Tang, [Wenjie Pei](https://wenjiepei.github.io/) and [Kai Chen*](https://chenkai.site/)
9
+
10
+ **Shanghai AI Laboratory**
11
+
12
+ ![](./assets/teaser.png "AnyControl")
13
+
14
+
15
+ ## Overview
16
+ The field of text-to-image (T2I) generation has made significant progress in recent years,
17
+ largely driven by advancements in diffusion models.
18
+ Linguistic control enables effective content creation, but struggles with fine-grained control over image generation.
19
+ This challenge has been explored, to a great extent, by incorporating additional usersupplied spatial conditions,
20
+ such as depth maps and edge maps, into pre-trained T2I models through extra encoding.
21
+ However, multi-control image synthesis still faces several challenges.
22
+ Specifically, current approaches are limited in handling free combinations of diverse input control signals,
23
+ overlook the complex relationships among multiple spatial conditions, and often fail to maintain semantic alignment with provided textual prompts.
24
+ This can lead to suboptimal user experiences. To address these challenges, we propose AnyControl,
25
+ a multi-control image synthesis framework that supports arbitrary combinations of diverse control signals.
26
+ AnyControl develops a novel Multi-Control Encoder that extracts a unified multi-modal embedding to guide the generation process.
27
+ This approach enables a holistic understanding of user inputs, and produces high-quality,
28
+ faithful results under versatile control signals, as demonstrated by extensive quantitative and qualitative evaluations.
29
+
30
+
31
+ ## Model Card
32
+ AnyControl for SD 1.5
33
+ - `ckpts/anycontrol_15.ckpt`: weights for AnyControl.
34
+ - `ckpts/init_local.ckpt`: initial weights of AnyControl during training, generated following [Uni-ControlNet](https://github.com/ShihaoZhaoZSH/Uni-ControlNet).
35
+ - `ckpts/blip2_pretrained.pth`: third-party model.
36
+ - `annotator/ckpts`: third-party models used in annotators.
37
+
38
+
39
+ ## License and Citation
40
+
41
+ All models and assets are under the [Apache 2.0 license](./LICENSE) unless specified otherwise.
42
+
43
+ If this work is helpful for your research, please consider citing the following BibTeX entry.
44
+
45
+ ``` bibtex
46
+ @misc{sun2024anycontrol,
47
+ title={AnyControl: Create your artwork with versatile control on text-to-image generation},
48
+ author={Sun, Yanan and Liu, Yanchen and Tang, Yinhao and Pei, Wenjie and Chen, Kai},
49
+ booktitle={ECCV},
50
+ year={2024}
51
+
52
+ }
53
+ ```