File size: 1,330 Bytes
6bc3ff8 7547fd1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
---
license: apache-2.0
language:
- en
pipeline_tag: image-to-image
tags:
- Diffusion Transformer
- Image Editing
- Scepter
- ACE
---
<h2 align="center">
ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer
</h2>
<h3 align="center">
<b>Tongyi Lab, Alibaba Group</b>
</h3>
<div align="center">
[**Paper**](https://arxiv.org/abs/2410.00086) **|** [**Project Page**](https://ali-vilab.github.io/ace-page/) **|** [**Code**](https://github.com/ali-vilab/ACE)
</div>
ACE is a unified foundational model framework that supports a wide range of visual generation tasks.
By defining CU for unifying multi-modal inputs across different tasks and incorporating long-context CU,
we introduce historical contextual information into visual generation tasks, paving
the way for ChatGPT-like dialog systems in visual generation.
<p>
<table align="center">
<tr>
<td>
<img src="assets/figures/teaser.png">
</td>
</tr>
</table>
</p>
## BibTeX
```bibtex
@article{han2024ace,
title={ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer},
author={Han, Zhen and Jiang, Zeyinzi and Pan, Yulin and Zhang, Jingfeng and Mao, Chaojie and Xie, Chenwei and Liu, Yu and Zhou, Jingren},
journal={arXiv preprint arXiv:2410.00086},
year={2024}
}
``` |