Monetico: An Efficient Reproduction of Meissonic for Text-to-Image Synthesis

Introduction

Similar to Meissonic, Monetico is a non-autoregressive masked image modeling text-to-image synthesis model capable of generating high-resolution images. It is designed to run efficiently on consumer-grade graphics cards.

Monetico is an efficient reproduction of Meissonic. Trained on 8 H100 GPUs for approximately one week, Monetico can generate high-quality 512x512 images that are comparable to those produced by Meissonic and SDXL.

Monetico was developed by Collov Labs. We extend our gratitude to @MeissonFlow and @viiika for their valuable advice on efficient training.

Usage

For detailed usage instructions, please refer to GitHub repository.

Citation

If you find this work helpful, please consider citing:

@article{bai2024meissonic,
  title={Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis},
  author={Bai, Jinbin and Ye, Tian and Chow, Wei and Song, Enxin and Chen, Qing-Guo and Li, Xiangtai and Dong, Zhen and Zhu, Lei and Yan, Shuicheng},
  journal={arXiv preprint arXiv:2410.08261},
  year={2024}
}
Downloads last month
55
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using Collov-Labs/Monetico 10