File size: 1,782 Bytes
8dd69fa d206bc1 7c0ad4f 44ebbd5 d206bc1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
---
pipeline_tag: text-to-image
---
<p align="center">
<img src="assets/generated_logo.jpg" height=120>
</p>
### <div align="center">EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models<div>
---
This is the official model weights of the model ''Edgen'' trained by EvolveDirector. For more datails, please refer to our paper and code repo.
## Setup
### Requirements
1. Build virtual environment for EvolveDirector
```shell
# create virtual environment for EvolveDirector
conda create -n evolvedirector python=3.9
conda activate evolvedirector
# cd to the path of this repo
# install packages
pip install --upgrade pip
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
pip install -U transformers accelerate diffusers SentencePiece ftfy beautifulsoup4
```
## Usage
1. Inference
```shell
python Inference/inference.py --image_size=1024 \
--t5_path "./model" \
--tokenizer_path "./model/sd-vae-ft-ema" \
--txt_file "text_prompts.txt" \ # put your text prompts in this file
--model_path "model/Edgen_1024px_v1.pth" \
--save_folder "output/test_model"
```
## Citation
```bibtex
@article{zhao2024evolvedirector,
title={EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models},
author={Zhao, Rui and Yuan, Hangjie and Wei, Yujie and Zhang, Shiwei and Gu, Yuchao and Ran, Lingmin and Wang, Xiang and Wu, Zhangjie and Zhang, Junhao and Zhang, Yingya and others},
journal={arXiv preprint arXiv:2410.07133},
year={2024}
}
```
## Shoutouts
- This code builds heavily on [PixArt-$\alpha$](https://github.com/PixArt-alpha/PixArt-alpha/). Thanks for open-sourcing!
|