ruizhaocv
/

Edgen

Model card Files Files and versions Community

Edgen / README.md

ruizhaocv's picture

Add pipeline tag (#2)

8dd69fa verified 30 days ago

|

1.78 kB

	---
	pipeline_tag: text-to-image
	---

	<p align="center">
	<img src="assets/generated_logo.jpg" height=120>
	</p>

	### <div align="center">EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models<div>

	---

	This is the official model weights of the model ''Edgen'' trained by EvolveDirector. For more datails, please refer to our paper and code repo.


	## Setup

	### Requirements

	1. Build virtual environment for EvolveDirector
	```shell
	# create virtual environment for EvolveDirector
	conda create -n evolvedirector python=3.9
	conda activate evolvedirector

	# cd to the path of this repo

	# install packages
	pip install --upgrade pip
	pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu121
	pip install -r requirements.txt
	pip install -U transformers accelerate diffusers SentencePiece ftfy beautifulsoup4
	```

	## Usage

	1. Inference
	```shell
	python Inference/inference.py --image_size=1024 \
	--t5_path "./model" \
	--tokenizer_path "./model/sd-vae-ft-ema" \
	--txt_file "text_prompts.txt" \ # put your text prompts in this file
	--model_path "model/Edgen_1024px_v1.pth" \
	--save_folder "output/test_model"
	```


	## Citation


	```bibtex
	@article{zhao2024evolvedirector,
	title={EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models},
	author={Zhao, Rui and Yuan, Hangjie and Wei, Yujie and Zhang, Shiwei and Gu, Yuchao and Ran, Lingmin and Wang, Xiang and Wu, Zhangjie and Zhang, Junhao and Zhang, Yingya and others},
	journal={arXiv preprint arXiv:2410.07133},
	year={2024}
	}
	```

	## Shoutouts

	- This code builds heavily on [PixArt-$\alpha$](https://github.com/PixArt-alpha/PixArt-alpha/). Thanks for open-sourcing!