|
--- |
|
library_name: diffusers |
|
license: mit |
|
pipeline_tag: unconditional-image-generation |
|
--- |
|
|
|
# Autoregressive Image Generation without Vector Quantization |
|
|
|
## About |
|
This model (MAR) introduces a novel approach to autoregressive image generation by eliminating the need for vector quantization. |
|
Instead of relying on discrete tokens, the model operates in a continuous-valued space using a diffusion process to model the per-token probability distribution. |
|
By employing a Diffusion Loss function, the model achieves efficient and high-quality image generation while benefiting from the speed advantages of autoregressive sequence modeling. |
|
This approach simplifies the generation process, making it applicable to broader continuous-valued domains beyond just image synthesis. |
|
It is based on [this paper](https://arxiv.org/abs/2406.11838) |
|
|
|
## Usage: |
|
You can easily load it through the Hugging Face `DiffusionPipeline` and optionally customize various parameters such as the model type, number of steps, and class labels. |
|
|
|
```python |
|
from diffusers import DiffusionPipeline |
|
|
|
# load the pretrained model |
|
pipeline = DiffusionPipeline.from_pretrained("jadechoghari/mar", trust_remote_code=True, custom_pipeline="jadechoghari/mar") |
|
|
|
# generate an image with the model |
|
generated_image = pipeline( |
|
model_type="mar_huge", # choose from 'mar_base', 'mar_large', or 'mar_huge' |
|
seed=42, # set a seed for reproducibility |
|
num_ar_steps=64, # number of autoregressive steps |
|
class_labels=[207, 360, 388], # provide valid ImageNet class labels |
|
cfg_scale=4, # classifier-free guidance scale |
|
output_dir="./images", # directory to save generated images |
|
cfg_schedule = "constant", # choose between 'constant' (suggested) and 'linear' |
|
) |
|
|
|
# display the generated image |
|
generated_image.show() |
|
``` |
|
|
|
<p align="center"> |
|
<img src="https://github.com/LTH14/mar/raw/main/demo/visual.png" width="500"> |
|
</p> |
|
|
|
This code loads the model, configures it for image generation, and saves the output to a specified directory. |
|
|
|
We offer three pre-trained MAR models in `safetensors` format: |
|
- `mar-base.safetensors` |
|
- `mar-large.safetensors` |
|
- `mar-huge.safetensors` |
|
|
|
|
|
<!-- <p align="center"> |
|
<img src="https://github.com/LTH14/mar/raw/main/demo/visual.png" width="720"> |
|
</p> --> |
|
|
|
This is a Hugging Face Diffusers/GPU implementation of the paper [Autoregressive Image Generation without Vector Quantization](https://arxiv.org/abs/2406.11838) |
|
|
|
The Official PyTorch Implementation is released in [this repository](https://github.com/LTH14/mar) |
|
|
|
``` |
|
@article{li2024autoregressive, |
|
title={Autoregressive Image Generation without Vector Quantization}, |
|
author={Li, Tianhong and Tian, Yonglong and Li, He and Deng, Mingyang and He, Kaiming}, |
|
journal={arXiv preprint arXiv:2406.11838}, |
|
year={2024} |
|
} |
|
``` |
|
|
|
## Acknowledgements |
|
We thank Congyue Deng and Xinlei Chen for helpful discussion. We thank |
|
Google TPU Research Cloud (TRC) for granting us access to TPUs, and Google Cloud Platform for |
|
supporting GPU resources. |
|
|
|
A large portion of codes in this repo is based on [MAE](https://github.com/facebookresearch/mae), [MAGE](https://github.com/LTH14/mage) and [DiT](https://github.com/facebookresearch/DiT). |
|
|
|
## Contact |
|
|
|
If you have any questions, feel free to contact me through email (tianhong@mit.edu). Enjoy! |