nielsr HF staff commited on
Commit
1e5f79f
1 Parent(s): 28b1da4

Add model card

Browse files

This PR adds the appropriate pipeline tag, and makes sure the model can be viewed from https://huggingface.co/papers/2411.10433.

Files changed (1) hide show
  1. README.md +13 -0
README.md ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: unconditional-image-generation
3
+ ---
4
+
5
+ This repository contains the model presented in [M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation](https://huggingface.co/papers/2411.10433).
6
+
7
+ ## Introduction
8
+
9
+ There exists recent work in computer vision, named VAR, that proposes a new autoregressive paradigm for image generation. Diverging from the vanilla next-token prediction, VAR structurally reformulates the image generation into a coarse to fine next-scale prediction. In this paper, we show that this scale-wise autoregressive framework can be effectively decoupled into intra-scale modeling, which captures local spatial dependencies within each scale, and inter-scale modeling, which models cross-scale relationships progressively from coarse-to-fine scales. This decoupling structure allows to rebuild VAR in a more computationally efficient manner. Specifically, for intra-scale modeling --- crucial for generating high-fidelity images --- we retain the original bidirectional self-attention design to ensure comprehensive modeling; for inter-scale modeling, which semantically connects different scales but is computationally intensive, we apply linear-complexity mechanisms like Mamba to substantially reduce computational overhead. We term this new framework M-VAR. Extensive experiments demonstrate that our method outperforms existing models in both image quality and generation speed. For example, our 1.5B model, with fewer parameters and faster inference speed, outperforms the largest VAR-d30-2B. Moreover, our largest model M-VAR-d32 impressively registers 1.78 FID on ImageNet 256X256 and outperforms the prior-art autoregressive models LlamaGen/VAR by 0.4/0.19 and popular diffusion models LDM/DiT by 1.82/0.49, respectively.
10
+
11
+ ## Usage
12
+
13
+ See the Github repository: https://github.com/OliverRensu/MVAR.