|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
# AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation |
|
[Yanan Sun](https://scholar.google.com/citations?user=6TA1oPkAAAAJ&hl=en), Yanchen Liu, Yinhao Tang, [Wenjie Pei](https://wenjiepei.github.io/) and [Kai Chen*](https://chenkai.site/) |
|
|
|
**Shanghai AI Laboratory** |
|
|
|
![](./assets/teaser.png "AnyControl") |
|
|
|
|
|
## Overview |
|
The field of text-to-image (T2I) generation has made significant progress in recent years, |
|
largely driven by advancements in diffusion models. |
|
Linguistic control enables effective content creation, but struggles with fine-grained control over image generation. |
|
This challenge has been explored, to a great extent, by incorporating additional usersupplied spatial conditions, |
|
such as depth maps and edge maps, into pre-trained T2I models through extra encoding. |
|
However, multi-control image synthesis still faces several challenges. |
|
Specifically, current approaches are limited in handling free combinations of diverse input control signals, |
|
overlook the complex relationships among multiple spatial conditions, and often fail to maintain semantic alignment with provided textual prompts. |
|
This can lead to suboptimal user experiences. To address these challenges, we propose AnyControl, |
|
a multi-control image synthesis framework that supports arbitrary combinations of diverse control signals. |
|
AnyControl develops a novel Multi-Control Encoder that extracts a unified multi-modal embedding to guide the generation process. |
|
This approach enables a holistic understanding of user inputs, and produces high-quality, |
|
faithful results under versatile control signals, as demonstrated by extensive quantitative and qualitative evaluations. |
|
|
|
|
|
## Model Card |
|
AnyControl for SD 1.5 |
|
- `ckpts/anycontrol_15.ckpt`: weights for AnyControl. |
|
- `ckpts/init_local.ckpt`: initial weights of AnyControl during training, generated following [Uni-ControlNet](https://github.com/ShihaoZhaoZSH/Uni-ControlNet). |
|
- `ckpts/blip2_pretrained.pth`: third-party model. |
|
- `annotator/ckpts`: third-party models used in annotators. |
|
|
|
|
|
## License and Citation |
|
|
|
All models and assets are under the [Apache 2.0 license](./LICENSE) unless specified otherwise. |
|
|
|
If this work is helpful for your research, please consider citing the following BibTeX entry. |
|
|
|
``` bibtex |
|
@misc{sun2024anycontrol, |
|
title={AnyControl: Create your artwork with versatile control on text-to-image generation}, |
|
author={Sun, Yanan and Liu, Yanchen and Tang, Yinhao and Pei, Wenjie and Chen, Kai}, |
|
booktitle={ECCV}, |
|
year={2024} |
|
|
|
} |
|
``` |