Update README.md
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ pipeline_tag: unconditional-image-generation
|
|
| 14 |
---
|
| 15 |
|
| 16 |
<p align="center">
|
| 17 |
-
<img src="https://raw.githubusercontent.com/NVlabs/PixelDiT/master/assets/pixeldit-logo.png" height="
|
| 18 |
</p>
|
| 19 |
|
| 20 |
<h2 align="center">PixelDiT: Pixel Diffusion Transformers for Image Generation</h2>
|
|
@@ -41,10 +41,6 @@ pipeline_tag: unconditional-image-generation
|
|
| 41 |
<a href="https://github.com/NVlabs/PixelDiT"><img src="https://img.shields.io/badge/GitHub-Code-blue" /></a>
|
| 42 |
</p>
|
| 43 |
|
| 44 |
-
## Model Overview
|
| 45 |
-
|
| 46 |
-
**PixelDiT-XL** (797M parameters) is a class-conditional image generation model trained on ImageNet, operating directly in **pixel space** — no VAE, no latent space. It uses a dual-level architecture combining a patch-level DiT for global semantics with a pixel-level DiT for fine texture details.
|
| 47 |
-
|
| 48 |
## Pre-trained Checkpoints
|
| 49 |
|
| 50 |
| Checkpoint | Resolution | Epochs | gFID | CFG Scale | Time Shift | CFG Interval |
|
|
@@ -52,7 +48,7 @@ pipeline_tag: unconditional-image-generation
|
|
| 52 |
| `imagenet256_pixeldit_xl_epoch80.ckpt` | 256x256 | 80 | **2.36** | 3.25 | 1.0 | [0.1, 1.0] |
|
| 53 |
| `imagenet256_pixeldit_xl_epoch160.ckpt` | 256x256 | 160 | **1.97** | 3.25 | 1.0 | [0.1, 1.0] |
|
| 54 |
| `imagenet256_pixeldit_xl_epoch320.ckpt` | 256x256 | 320 | **1.61** | 2.75 | 1.0 | [0.1, 0.9] |
|
| 55 |
-
| `imagenet512_pixeldit_xl.ckpt` | 512x512 | 850 | **1.
|
| 56 |
|
| 57 |
All evaluations use **FlowDPMSolver** with **100 steps**. 50K samples. Metrics follow the ADM evaluation protocol.
|
| 58 |
|
|
@@ -96,20 +92,6 @@ torchrun --nproc_per_node=8 main.py predict \
|
|
| 96 |
|
| 97 |
After generating samples, compute FID with the [ADM evaluation toolkit](https://github.com/openai/guided-diffusion/tree/main/evaluations).
|
| 98 |
|
| 99 |
-
## Model Architecture
|
| 100 |
-
|
| 101 |
-
| Component | Value |
|
| 102 |
-
|-----------|-------|
|
| 103 |
-
| Parameters | 797M |
|
| 104 |
-
| Input channels | 3 (RGB) |
|
| 105 |
-
| Patch size | 16 |
|
| 106 |
-
| Hidden size | 1152 |
|
| 107 |
-
| Attention heads | 16 |
|
| 108 |
-
| Patch-level depth | 26 |
|
| 109 |
-
| Pixel-level depth | 4 |
|
| 110 |
-
| Pixel hidden size | 16 |
|
| 111 |
-
| Classes | 1000 (ImageNet) |
|
| 112 |
-
|
| 113 |
## Citation
|
| 114 |
|
| 115 |
```bibtex
|
|
|
|
| 14 |
---
|
| 15 |
|
| 16 |
<p align="center">
|
| 17 |
+
<img src="https://raw.githubusercontent.com/NVlabs/PixelDiT/master/assets/pixeldit-logo.png" height="60" />
|
| 18 |
</p>
|
| 19 |
|
| 20 |
<h2 align="center">PixelDiT: Pixel Diffusion Transformers for Image Generation</h2>
|
|
|
|
| 41 |
<a href="https://github.com/NVlabs/PixelDiT"><img src="https://img.shields.io/badge/GitHub-Code-blue" /></a>
|
| 42 |
</p>
|
| 43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
## Pre-trained Checkpoints
|
| 45 |
|
| 46 |
| Checkpoint | Resolution | Epochs | gFID | CFG Scale | Time Shift | CFG Interval |
|
|
|
|
| 48 |
| `imagenet256_pixeldit_xl_epoch80.ckpt` | 256x256 | 80 | **2.36** | 3.25 | 1.0 | [0.1, 1.0] |
|
| 49 |
| `imagenet256_pixeldit_xl_epoch160.ckpt` | 256x256 | 160 | **1.97** | 3.25 | 1.0 | [0.1, 1.0] |
|
| 50 |
| `imagenet256_pixeldit_xl_epoch320.ckpt` | 256x256 | 320 | **1.61** | 2.75 | 1.0 | [0.1, 0.9] |
|
| 51 |
+
| `imagenet512_pixeldit_xl.ckpt` | 512x512 | 850 | **1.81** | 3.5 | 2.0 | [0.1, 1.0] |
|
| 52 |
|
| 53 |
All evaluations use **FlowDPMSolver** with **100 steps**. 50K samples. Metrics follow the ADM evaluation protocol.
|
| 54 |
|
|
|
|
| 92 |
|
| 93 |
After generating samples, compute FID with the [ADM evaluation toolkit](https://github.com/openai/guided-diffusion/tree/main/evaluations).
|
| 94 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
## Citation
|
| 96 |
|
| 97 |
```bibtex
|