|
--- |
|
pipeline_tag: text-to-image |
|
--- |
|
|
|
<p align="center"> |
|
<img src="assets/generated_logo.jpg" height=120> |
|
</p> |
|
|
|
### <div align="center">EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models<div> |
|
|
|
--- |
|
|
|
This is the official model weights of the model ''Edgen'' trained by EvolveDirector. For more datails, please refer to our paper and code repo. |
|
|
|
|
|
## Setup |
|
|
|
### Requirements |
|
|
|
1. Build virtual environment for EvolveDirector |
|
```shell |
|
# create virtual environment for EvolveDirector |
|
conda create -n evolvedirector python=3.9 |
|
conda activate evolvedirector |
|
|
|
# cd to the path of this repo |
|
|
|
# install packages |
|
pip install --upgrade pip |
|
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu121 |
|
pip install -r requirements.txt |
|
pip install -U transformers accelerate diffusers SentencePiece ftfy beautifulsoup4 |
|
``` |
|
|
|
## Usage |
|
|
|
1. Inference |
|
```shell |
|
python Inference/inference.py --image_size=1024 \ |
|
--t5_path "./model" \ |
|
--tokenizer_path "./model/sd-vae-ft-ema" \ |
|
--txt_file "text_prompts.txt" \ # put your text prompts in this file |
|
--model_path "model/Edgen_1024px_v1.pth" \ |
|
--save_folder "output/test_model" |
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
@article{zhao2024evolvedirector, |
|
title={EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models}, |
|
author={Zhao, Rui and Yuan, Hangjie and Wei, Yujie and Zhang, Shiwei and Gu, Yuchao and Ran, Lingmin and Wang, Xiang and Wu, Zhangjie and Zhang, Junhao and Zhang, Yingya and others}, |
|
journal={arXiv preprint arXiv:2410.07133}, |
|
year={2024} |
|
} |
|
``` |
|
|
|
## Shoutouts |
|
|
|
- This code builds heavily on [PixArt-$\alpha$](https://github.com/PixArt-alpha/PixArt-alpha/). Thanks for open-sourcing! |
|
|