EVA-CLIP / README.md
QuanSun's picture
add model cards
6d42845
|
raw
history blame
2.65 kB
metadata
license: mit

https://github.com/baaivision/EVA/tree/master/EVA-CLIP

Model Card

EVA-01-CLIP Series (MIM teacher: OpenAI CLIP-Large)

model name total #params training precision training data training batch size gpus for training IN-1K zero-shot top-1 MSCOCO T2I R@5 weight
EVA01_CLIP_g_14_psz14_s11B 1.1B fp16 LAION-400M 41K 256 A100(40GB) 78.5 68.5 πŸ€— HF link (2.2GB)
EVA01_CLIP_g_14_plus_psz14_s11B 1.3B fp16 Merged-2B 114K 112 A100(40GB) 79.3 74.0 πŸ€— HF link (2.7GB)

EVA-02-CLIP Series (MIM teacher: EVA01_CLIP_g_14_psz14_s11B)

model name total #params training precision training data training batch size gpus for training IN-1K zero-shot top-1 MSCOCO T2I R@5 weight
EVA02_CLIP_B_psz16_s8B 149M fp16 Merged-2B 131K 64 A100(40GB) 74.7 66.9 πŸ€— HF link (300MB)
EVA02_CLIP_L_psz14_s4B 428M fp16 Merged-2B 131K 128 A100(40GB) 79.8 71.2 πŸ€— HF link (856MB)
EVA02_CLIP_L_336_psz14_s6B 428M fp16 Merged-2B 61K 128 A100(40GB) 80.4 71.7 πŸ€— HF link (856MB)
EVA02_CLIP_E_psz14_s4B.pt 4.7B fp16 LAION-2B 144K 144 A100(80GB) 81.9 74.7 πŸ€— HF link (9.4GB)
  • To construct Merged-2B, we merged 1.6 billion samples from LAION-2B dataset with 0.4 billion samples from COYO-700M.

  • To our knowledge, EVA-CLIP series are the most performant open-sourced CLIP models at all scales, evaluated via zero-shot classification performance, especially on mainstream classification benchmarks such as ImageNet along with its variants. For more details about EVA-CLIP, please refer to our paper (coming very soon).