tliby commited on
Commit
16e7250
·
verified ·
1 Parent(s): 6d552c2

upload readme

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +167 -3
  3. assets/pipeline.png +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/pipeline.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,167 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders
2
+
3
+ <p align="center">
4
+ <a href=""><img src="https://img.shields.io/badge/Paper-Arxiv-b31b1b.svg" alt="arXiv"></a>&nbsp;
5
+ <a href="https://huggingface.co/inclusionAI/TC-AE/tree/main"><img src="https://img.shields.io/badge/🤗%20Hugging%20Face-Models-yellow" alt="Models"></a>
6
+ </p>
7
+ <div align="center">
8
+ <a href="https://tliby.github.io/" target="_blank">Teng&nbsp;Li</a><sup>1,2*</sup>,
9
+ <a href="https://huang-ziyuan.github.io/" target="_blank">Ziyuan&nbsp;Huang</a><sup>1,*,✉</sup>,
10
+ <a href="https://scholar.google.com/citations?user=kwDXTpAAAAAJ&hl=en" target="_blank">Cong&nbsp;Chen</a><sup>1,3,*</sup>,
11
+ <a href="https://ychenl.github.io/" target="_blank">Yangfu&nbsp;Li</a><sup>1,4</sup>,
12
+ <a href="https://qc-ly.github.io/" target="_blank">Yuanhuiyi&nbsp;Lyu</a><sup>1,5</sup>, <br>
13
+ <a href="#" target="_blank">Dandan&nbsp;Zheng</a><sup>1</sup>,
14
+ <a href="https://scholar.google.com/citations?user=Ljk2BvIAAAAJ&hl=en" target="_blank">Chunhua&nbsp;Shen</a><sup>3</sup>,
15
+ <a href="https://eejzhang.people.ust.hk/" target="_blank">Jun&nbsp;Zhang</a><sup>2✉</sup><br>
16
+ <sup>1</sup>Inclusion AI, Ant Group, <sup>2</sup>HKUST, <sup>3</sup>ZJU, <sup>4</sup>ECNU, <sup>5</sup>HKUST (GZ) <br>
17
+ <sup>*</sup>Equal contribution, ✉ Corresponding authors <br>
18
+ </div>
19
+
20
+
21
+
22
+
23
+ ## News
24
+
25
+ - [2026/03/30] Research paper, code, and models are released for TC-AE!
26
+
27
+
28
+ ## Introduction
29
+
30
+ <p align="center">
31
+ <img src="assets/pipeline.png" width=98%>
32
+ <p>
33
+
34
+
35
+
36
+ **TC-AE** is a novel Vision Transformer (ViT)-based tokenizer for deep image compression and visual generation. Traditional deep compression methods typically increase channel dimensions to maintain reconstruction quality at high compression ratios, but this often leads to representation collapse that degrades generative performance. TC-AE addresses this fundamental challenge from a new perspective: **optimizing the token space** — the critical bridge between pixels and latent representations. By scaling token numbers and enhancing their semantic structure, TC-AE achieves superior reconstruction and generation quality. Key Innovations:
37
+
38
+ - Token Space Optimization: First to address representation collapse through token sapce optimization
39
+ - Staged Token Compression: Decomposes token-to-latent mapping into two stages, reducing structural information loss in the bottleneck
40
+ - Semantic Enhancement: Incorporates self-supervised learning to produce more generative-friendly latents
41
+
42
+ 🚀 In this codebase, we release:
43
+
44
+ - Pre-trained TC-AE tokenizer weights and evaluation code
45
+ - Diffusion model training and evaluation code
46
+
47
+ ## Environment Setup
48
+
49
+ To set up the environment for TC-AE, follow these steps:
50
+
51
+ ```shell
52
+ conda create -n tcae python=3.9
53
+ conda activate tcae
54
+ pip install -r requirements.txt
55
+ ```
56
+
57
+ ## Download Checkpoints
58
+
59
+ Download the pre-trained TC-AE weights and place them in the `results/` directory:
60
+
61
+
62
+ | Tokenizer | Compression Ratio | rFID | LPIPS | Pretrained Weights |
63
+ | --------- | ----------------- | ---- | ----- | ------------------------------------------------------------ |
64
+ | TC-AE-SL | f32d128 | 0.35 | 0.060 | [![Models](https://img.shields.io/badge/🤗%20Hugging%20Face-Models-yellow)](https://huggingface.co/inclusionAI/TC-AE/tree/main) |
65
+
66
+
67
+ ## Reconstruction Evaluation
68
+
69
+ ##### Image Reconstruction Demo
70
+
71
+ ```shell
72
+ python tcae/script/demo_recon.py \
73
+ --img_folder /path/to/your/images \
74
+ --output_folder /path/to/output \
75
+ --ckpt_path results/tcae.pt \
76
+ --config configs/TC-AE-SL.yaml \
77
+ --rank 0
78
+ ```
79
+
80
+ ##### ImageNet Evaluation
81
+
82
+ Evaluate reconstruction quality on ImageNet validation set:
83
+
84
+ ```shell
85
+ python tcae/script/eval_recon.py \
86
+ --ckpt_path results/tcae.pt \
87
+ --dataset_root /path/to/imagenet_val \
88
+ --config configs/TC-AE-SL.yaml \
89
+ --rank 0
90
+ ```
91
+
92
+ ## Generation Evaluation
93
+
94
+ Our DiT architecture and training pipeline are based on [RAE](https://github.com/bytetriper/RAE) and [VA-VAE](https://github.com/hustvl/LightningDiT).
95
+
96
+ ##### Prepare ImageNet Latents for Training
97
+
98
+ Extract and cache latent representations from ImageNet training set:
99
+
100
+ ```shell
101
+ accelerate launch \
102
+ --mixed_precision bf16 \
103
+ diffusion/script/extract_features.py \
104
+ --data_path /path/to/imagenet_train \
105
+ --batch_size 50 \
106
+ --tokenizer_cfg_path configs/TC-AE-SL.yaml \
107
+ --tokenizer_ckpt_path results/tcae.pt
108
+ ```
109
+
110
+ This will cache latents to `results/cached_latents/imagenet_train_256/`.
111
+
112
+ ##### Training
113
+
114
+ Train a DiT-XL model on the extracted latents:
115
+
116
+ ```shell
117
+ mkdir -p results/dit
118
+ torchrun --standalone --nproc_per_node=8 \
119
+ diffusion/script/train_dit.py \
120
+ --config configs/DiT-XL.yaml \
121
+ --data-path results/cached_latents/imagenet_train_256 \
122
+ --results-dir results/dit \
123
+ --image-size 256 \
124
+ --precision bf16
125
+ ```
126
+
127
+ ##### Sampling
128
+
129
+ Generate images using the trained diffusion model:
130
+
131
+ ```shell
132
+ mkdir -p results/dit/samples
133
+ torchrun --standalone --nnodes=1 --nproc_per_node=8 \
134
+ diffusion/script/sample_ddp_dit.py \
135
+ --config configs/DiT-XL.yaml \
136
+ --sample-dir results/dit/samples \
137
+ --precision bf16 \
138
+ --label-sampling equal \
139
+ --tokenizer_cfg_path configs/TC-AE-SL.yaml \
140
+ --tokenizer_ckpt_path results/tcae.pt
141
+ ```
142
+
143
+ ##### Evaluation
144
+
145
+ Download the ImageNet reference statistics: [adm_in256_stats.npz](https://huggingface.co/jjiaweiyang/l-DeTok/commit/28ef58d254bb1bde10e331372fe542e5458f3b5f#d2h-232267) and place it in `results/`.
146
+
147
+ ```shell
148
+ python diffusion/script/eval_dit.py \
149
+ --generated_dir results/dit/samples/DiT-0100000-cfg-1.00-bs100-ODE-50-euler-bf16 \
150
+ --reference_npz results/adm_in256_stats.npz \
151
+ --batch-size 512 \
152
+ --num-workers 8
153
+ ```
154
+
155
+ ## Acknowledgements
156
+
157
+ The codebase is built on [HieraTok](https://arxiv.org/abs/2509.23736), [RAE](https://github.com/bytetriper/RAE), [VA-VAE](https://github.com/hustvl/LightningDiT), [iBOT](https://github.com/bytedance/ibot). Thanks for their efforts!
158
+
159
+ ## License
160
+
161
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
162
+
163
+ ## Citation
164
+
165
+ ```
166
+
167
+ ```
assets/pipeline.png ADDED

Git LFS Details

  • SHA256: 2978a7b71c2c59c18e00e1b532f9aa750b1313c22c196acabaa374915cd67e90
  • Pointer size: 131 Bytes
  • Size of remote file: 994 kB