Luo-Yihong
/

yoso_pixart1024

Model card Files Files and versions Community

yoso_pixart1024 / README.md

Luo-Yihong's picture

Update README.md

7d58a57 verified 7 months ago

|

2.5 kB

	---
	language:
	- en
	library_name: diffusers
	pipeline_tag: text-to-image
	---
	# You Only Sample Once (YOSO)

	![overview](overview.jpg)

	The YOSO was proposed in "[You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs](https://www.arxiv.org/abs/2403.12931)" by Yihong Luo, Xiaolong Chen, Xinghua Qu, Jing Tang.

	Official Repository of this paper: [YOSO](https://github.com/Luo-Yihong/YOSO).

	This model is fine-tuning from [
	PixArt-XL-2-512x512](https://huggingface.co/PixArt-alpha/PixArt-XL-2-512x512), enabling one-step inference to perform text-to-image generation.

	We wanna highlight that the YOSO-PixArt was originally trained on 512 resolution. However, we found that we can construct a YOSO that enables generating samples with 1024 resolution by merging with [
	PixArt-XL-2-1024-MS](https://huggingface.co/PixArt-alpha/PixArt-XL-2-1024-MS
	) (Section 6.3.1 in the paper). The impressive performance indicates the robust generalization ability of our YOSO.
	## usage
	```python
	import torch
	from diffusers import PixArtAlphaPipeline, LCMScheduler, Transformer2DModel

	transformer = Transformer2DModel.from_pretrained(
	"Luo-Yihong/yoso_pixart1024", torch_dtype=torch.float16).to('cuda')

	pipe = PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-512x512",
	transformer=transformer,
	torch_dtype=torch.float16, use_safetensors=True)

	pipe = pipe.to('cuda')
	pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
	pipe.scheduler.config.prediction_type = "v_prediction"
	generator = torch.manual_seed(318)
	imgs = pipe(prompt="Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumetric lighting, spectacular, ambient lights, light pollution, cinematic atmosphere, art nouveau style, illustration art artwork by SenseiJaye, intricate detail.",
	num_inference_steps=1,
	num_images_per_prompt = 1,
	generator = generator,
	guidance_scale=1.,
	)[0]
	imgs[0]
	```
	![Ship](ship_1024.jpg)

	## Bibtex
	```
	@misc{luo2024sample,
	title={You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs},
	author={Yihong Luo and Xiaolong Chen and Xinghua Qu and Jing Tang},
	year={2024},
	eprint={2403.12931},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```