FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
Abstract
Despite their remarkable performance, modern Diffusion Transformers are hindered by substantial resource requirements during inference, stemming from the fixed and large amount of compute needed for each denoising step. In this work, we revisit the conventional static paradigm that allocates a fixed compute budget per denoising iteration and propose a dynamic strategy instead. Our simple and sample-efficient framework enables pre-trained DiT models to be converted into flexible ones -- dubbed FlexiDiT -- allowing them to process inputs at varying compute budgets. We demonstrate how a single flexible model can generate images without any drop in quality, while reducing the required FLOPs by more than 40\% compared to their static counterparts, for both class-conditioned and text-conditioned image generation. Our method is general and agnostic to input and conditioning modalities. We show how our approach can be readily extended for video generation, where FlexiDiT models generate samples with up to 75\% less compute without compromising performance.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens (2025)
- MC-VTON: Minimal Control Virtual Try-On Diffusion Transformer (2025)
- FlexControl: Computation-Aware ControlNet with Differentiable Router for Text-to-Image Generation (2025)
- Accelerate High-Quality Diffusion Models with Inner Loop Feedback (2025)
- CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation (2025)
- E-MD3C: Taming Masked Diffusion Transformers for Efficient Zero-Shot Object Customization (2025)
- CAT: Content-Adaptive Image Tokenization (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper