yangnianzu/CDT · Hugging Face

In this repository, we provide the pre-trained models, including the CDT-S and CDT-B models in the pretrained folder. The inference code is available at here.

If you find this work useful in your research, please consider citing:

@article{yang2025rethinking,
  title={Rethinking Video Tokenization: A Conditioned Diffusion-based Approach},
  author={Yang, Nianzu and Li, Pandeng and Zhao, Liming and Li, Yang and Xie, Chen-Wei and Tang, Yehui and Lu, Xudong and Liu, Zhihang and Zheng, Yun and Liu, Yu and Yan, Junchi},
  journal={arXiv preprint arXiv:2503.03708},
  year={2025}
}

Feel free to reach out to me at yangnianzu@sjtu.edu.cn for any question.