arxiv:2405.14224

DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis

Published on May 23

· Submitted by

akhaliq on May 24

Upvote

Authors:

Xuefei Ning ,

Zhenguo Li ,

Xihui Liu

Abstract

Diffusion models have achieved great success in image generation, with the backbone evolving from U-Net to Vision Transformers. However, the computational cost of Transformers is quadratic to the number of tokens, leading to significant challenges when dealing with high-resolution images. In this work, we propose Diffusion Mamba (DiM), which combines the efficiency of Mamba, a sequence model based on State Space Models (SSM), with the expressive power of diffusion models for efficient high-resolution image synthesis. To address the challenge that Mamba cannot generalize to 2D signals, we make several architecture designs including multi-directional scans, learnable padding tokens at the end of each row and column, and lightweight local feature enhancement. Our DiM architecture achieves inference-time efficiency for high-resolution images. In addition, to further improve training efficiency for high-resolution image generation with DiM, we investigate ``weak-to-strong'' training strategy that pretrains DiM on low-resolution images (256times 256) and then finetune it on high-resolution images (512 times 512). We further explore training-free upsampling strategies to enable the model to generate higher-resolution images (e.g., 1024times 1024 and 1536times 1536) without further fine-tuning. Experiments demonstrate the effectiveness and efficiency of our DiM.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2405.14224 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2405.14224 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2405.14224 in a Space README.md to link it from this page.