arxiv:2311.09257

UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

Published on Nov 14, 2023

· Submitted by

akhaliq on Nov 17, 2023

#2 Paper of the day

Upvote

Authors:

Yang Zhao ,

Zhisheng Xiao ,

Tingbo Hou

Abstract

Text-to-image diffusion models have demonstrated remarkable capabilities in transforming textual prompts into coherent images, yet the computational cost of their inference remains a persistent challenge. To address this issue, we present UFOGen, a novel generative model designed for ultra-fast, one-step text-to-image synthesis. In contrast to conventional approaches that focus on improving samplers or employing distillation techniques for diffusion models, UFOGen adopts a hybrid methodology, integrating diffusion models with a GAN objective. Leveraging a newly introduced diffusion-GAN objective and initialization with pre-trained diffusion models, UFOGen excels in efficiently generating high-quality images conditioned on textual descriptions in a single step. Beyond traditional text-to-image generation, UFOGen showcases versatility in applications. Notably, UFOGen stands among the pioneering models enabling one-step text-to-image generation and diverse downstream tasks, presenting a significant advancement in the landscape of efficient generative models. \blfootnote{*Work done as a student researcher of Google, dagger indicates equal contribution.

View arXiv page View PDF Add to collection

Community

multimodalart

Nov 17, 2023

The main figure presented at the paper

valhalla

Nov 17, 2023

I think the LCM (and LCM-LoRA) results are much better than they have showcased here.

Tybost

Nov 17, 2023

I think the LCM (and LCM-LoRA) results are much better than they have showcased here.

Sure it's capable of much better results at higher steps... but at 1 step or 2 as shown in the example?

Vargol

Nov 17, 2023

I'd guess that they used the original LCM_Dreamshaper_v7 model with produces results like that at 2 and 4 steps.
The distilled LCM SDXL produces much better images at 2 and 4 steps than the examples