license: apache-2.0 | |
datasets: | |
- ILSVRC/imagenet-1k | |
model-index: | |
- name: MaskGIT-Tokenizer-10bits | |
results: | |
- task: | |
type: image-generation | |
dataset: | |
name: ILSVRC/imagenet-1k | |
type: ILSVRC/imagenet-1k | |
metrics: | |
- name: rFID | |
type: rFID | |
value: 1.96 | |
- name: InceptionScore | |
type: InceptionScore | |
value: 178.3 | |
- name: LPIPS | |
type: LPIPS | |
value: 0.331 | |
- name: PSNR | |
type: PSNR | |
value: 18.6 | |
- name: SSIM | |
type: SSIM | |
value: 0.47 | |
- name: CodebookUsage | |
type: CodebookUsage | |
value: 1.0 | |
This model is the MaskGIT tokenizer with a vocabulary size of 10bits adopted for the usage in the MaskBit codebase. It uses a downsampling factor of 16 and is trained on ImageNet for images of resolution 256. | |
You can find more details in the original [repository](https://github.com/google-research/maskgit) and in the [paper](https://arxiv.org/abs/2202.04200). All credits for this model belong to Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, and William T. Freeman. |