File size: 1,330 Bytes
6bc3ff8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7547fd1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
license: apache-2.0
language:
- en
pipeline_tag: image-to-image
tags:
- Diffusion Transformer
- Image Editing
- Scepter
- ACE
---
<h2 align="center">
    ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer
</h2>

<h3 align="center">
    <b>Tongyi Lab, Alibaba Group</b>
</h3>

<div align="center">

[**Paper**](https://arxiv.org/abs/2410.00086) **|** [**Project Page**](https://ali-vilab.github.io/ace-page/) **|** [**Code**](https://github.com/ali-vilab/ACE)

</div>


ACE is a unified foundational model framework that supports a wide range of visual generation tasks. 
By defining CU for unifying multi-modal inputs across different tasks and incorporating long-context CU, 
we introduce historical contextual information into visual generation tasks, paving
the way for ChatGPT-like dialog systems in visual generation.

<p>
  <table align="center">
    <tr>
    <td>
      <img src="assets/figures/teaser.png">
    </td>
    </tr>
  </table>
</p>



## BibTeX

```bibtex
@article{han2024ace,
  title={ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer},
  author={Han, Zhen and Jiang, Zeyinzi and Pan, Yulin and Zhang, Jingfeng and Mao, Chaojie and Xie, Chenwei and Liu, Yu and Zhou, Jingren},
  journal={arXiv preprint arXiv:2410.00086},
  year={2024}
}
```