Model card for CLIP ViT-T-16 from CLIP-KD method trained from CC3MCC12M and knowledge distillation from CLIP ViT-B-16 to CC3MCC12M CLIP ViT-T-16

Model Description

A CLIP ViT-T/16 model trained by CLIP-KD method with the using CC3MCC12M. The weight of this model is converted from ViT-B-16_cc3m_12m_kd_ViT-T-16_cc3m_12m_ep32.pt in open_clip to huggingface clip format.

Reference

Please refer to the original work.

@inproceedings{yang2024clip,
  title={CLIP-KD: An Empirical Study of CLIP Model Distillation},
  author={Yang, Chuanguang and An, Zhulin and Huang, Libo and Bi, Junyu and Yu, Xinqiang and Yang, Han and Diao, Boyu and Xu, Yongjun},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024}
}

romrawinjp
/

clip-kd_ViT-T-16-CC3M12M_KD-CC3M12M

You need to agree to share your contact information to access this model

Model card for CLIP ViT-T-16 from CLIP-KD method trained from CC3MCC12M and knowledge distillation from CLIP ViT-B-16 to CC3MCC12M CLIP ViT-T-16

Model Description

Reference

Collection including romrawinjp/clip-kd_ViT-T-16-CC3M12M_KD-CC3M12M

CLIP Knowledge Distillation