yongshengy commited on
Commit
6584a95
·
verified ·
1 Parent(s): 3c41af3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -20
README.md CHANGED
@@ -14,7 +14,7 @@ pipeline_tag: unconditional-image-generation
14
  ---
15
 
16
  <p align="center">
17
- <img src="https://raw.githubusercontent.com/NVlabs/PixelDiT/master/assets/pixeldit-logo.png" height="120" />
18
  </p>
19
 
20
  <h2 align="center">PixelDiT: Pixel Diffusion Transformers for Image Generation</h2>
@@ -41,10 +41,6 @@ pipeline_tag: unconditional-image-generation
41
  <a href="https://github.com/NVlabs/PixelDiT"><img src="https://img.shields.io/badge/GitHub-Code-blue" /></a>
42
  </p>
43
 
44
- ## Model Overview
45
-
46
- **PixelDiT-XL** (797M parameters) is a class-conditional image generation model trained on ImageNet, operating directly in **pixel space** — no VAE, no latent space. It uses a dual-level architecture combining a patch-level DiT for global semantics with a pixel-level DiT for fine texture details.
47
-
48
  ## Pre-trained Checkpoints
49
 
50
  | Checkpoint | Resolution | Epochs | gFID | CFG Scale | Time Shift | CFG Interval |
@@ -52,7 +48,7 @@ pipeline_tag: unconditional-image-generation
52
  | `imagenet256_pixeldit_xl_epoch80.ckpt` | 256x256 | 80 | **2.36** | 3.25 | 1.0 | [0.1, 1.0] |
53
  | `imagenet256_pixeldit_xl_epoch160.ckpt` | 256x256 | 160 | **1.97** | 3.25 | 1.0 | [0.1, 1.0] |
54
  | `imagenet256_pixeldit_xl_epoch320.ckpt` | 256x256 | 320 | **1.61** | 2.75 | 1.0 | [0.1, 0.9] |
55
- | `imagenet512_pixeldit_xl.ckpt` | 512x512 | 850 | **1.78** | 3.5 | 2.0 | [0.1, 1.0] |
56
 
57
  All evaluations use **FlowDPMSolver** with **100 steps**. 50K samples. Metrics follow the ADM evaluation protocol.
58
 
@@ -96,20 +92,6 @@ torchrun --nproc_per_node=8 main.py predict \
96
 
97
  After generating samples, compute FID with the [ADM evaluation toolkit](https://github.com/openai/guided-diffusion/tree/main/evaluations).
98
 
99
- ## Model Architecture
100
-
101
- | Component | Value |
102
- |-----------|-------|
103
- | Parameters | 797M |
104
- | Input channels | 3 (RGB) |
105
- | Patch size | 16 |
106
- | Hidden size | 1152 |
107
- | Attention heads | 16 |
108
- | Patch-level depth | 26 |
109
- | Pixel-level depth | 4 |
110
- | Pixel hidden size | 16 |
111
- | Classes | 1000 (ImageNet) |
112
-
113
  ## Citation
114
 
115
  ```bibtex
 
14
  ---
15
 
16
  <p align="center">
17
+ <img src="https://raw.githubusercontent.com/NVlabs/PixelDiT/master/assets/pixeldit-logo.png" height="60" />
18
  </p>
19
 
20
  <h2 align="center">PixelDiT: Pixel Diffusion Transformers for Image Generation</h2>
 
41
  <a href="https://github.com/NVlabs/PixelDiT"><img src="https://img.shields.io/badge/GitHub-Code-blue" /></a>
42
  </p>
43
 
 
 
 
 
44
  ## Pre-trained Checkpoints
45
 
46
  | Checkpoint | Resolution | Epochs | gFID | CFG Scale | Time Shift | CFG Interval |
 
48
  | `imagenet256_pixeldit_xl_epoch80.ckpt` | 256x256 | 80 | **2.36** | 3.25 | 1.0 | [0.1, 1.0] |
49
  | `imagenet256_pixeldit_xl_epoch160.ckpt` | 256x256 | 160 | **1.97** | 3.25 | 1.0 | [0.1, 1.0] |
50
  | `imagenet256_pixeldit_xl_epoch320.ckpt` | 256x256 | 320 | **1.61** | 2.75 | 1.0 | [0.1, 0.9] |
51
+ | `imagenet512_pixeldit_xl.ckpt` | 512x512 | 850 | **1.81** | 3.5 | 2.0 | [0.1, 1.0] |
52
 
53
  All evaluations use **FlowDPMSolver** with **100 steps**. 50K samples. Metrics follow the ADM evaluation protocol.
54
 
 
92
 
93
  After generating samples, compute FID with the [ADM evaluation toolkit](https://github.com/openai/guided-diffusion/tree/main/evaluations).
94
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
  ## Citation
96
 
97
  ```bibtex