Image Feature Extraction
English
image-to-image

Open-MAGVIT2: Democratizing Autoregressive Visual Generation

Code: https://github.com/TencentARC/Open-MAGVIT2

Paper: https://huggingface.co/papers/2409.04410

Introduction

Until now, VQGAN, the initial tokenizer is still acting an indispensible role in mainstream tasks, especially autoregressive visual generation. Limited by the bottleneck of the size of codebook and the utilization of code, the capability of AR generation with VQGAN is underestimated.

Therefore, MAGVIT2 proposes a powerful tokenizer for visual generation task, which introduces a novel LookUpFree technique when quantization and extends the size of codebook to $2^{18}$, exhibiting promising performance in both image and video generation tasks. And it plays an important role in the recent state-of-the-art AR video generation model VideoPoet. However, we have no access to this strong tokenizer so far. ☹️

In the codebase, we follow the significant insights of tokenizer design in MAGVIT-2 and re-implement it with Pytorch, achieving the closest results to the original so far. We hope that our effort can foster innovation, creativity within the field of Autoregressive Visual Generation. 😄

ImageNet 128 × 128:

ImageNet 256 × 256:

Usage

Refer to the Github repository which includes scripts for training, evaluation and inference.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Collection including TencentARC/Open-MAGVIT2