IDEA-CCNL
/

Taiyi-CLIP-Roberta-large-326M-Chinese

@@ -15,8 +15,7 @@ tags:
 # Model Details
-This model is a Chinese CLIP model trained on [Noah-Wukong Dataset](https://wukong-dataset.github.io/wukong-dataset/), which contains about 100M Chinese image-text pairs. We use ViT-L-14 from [openAI](https://github.com/openai/CLIP) as image encoder and Chinese pre-trained language model  [chinese-roberta-wwm-large](https://huggingface.co/hfl/chinese-roberta-wwm-ext-large) as text encoder. We freeze the image encoder and only finetune the text encoder. The model was trained for 10 epochs and it takes about 5 days with 16 A100 GPUs. **This is a beta version, We will continueously update this model**
 # Taiyi (太乙)
 Taiyi models are a branch of the Fengshenbang (封神榜) series of models. The models in Taiyi are pre-trained with multimodal pre-training strategies. We will release more image-text model trained on Chinese dataset and benefit the Chinese community.
@@ -65,15 +64,15 @@ with torch.no_grad():
 |  model   | dataset  | Top1 | Top5 |
 |  ----  | ----  | ---- | ---- |
-| Taiyi-CLIP-Roberta-326M-Chinese  | ImageNet1k-CN | 51.72% | 78.46% |
 ### Zero-Shot Text-to-Image Retrieval
 |  model   | dataset  | Top1 | Top5 | Top10 |
 |  ----  | ----  | ---- | ---- | ---- |
-| Taiyi-CLIP-Roberta-326M-Chinese  | Flickr30k-CNA-test | 51.08% | 78.20%  | 85.94% |
-| Taiyi-CLIP-Roberta-326M-Chinese  | COCO-CN-test | 52.61% | 80.34%  | 89.55% |
-| Taiyi-CLIP-Roberta-326M-Chinese  | wukong50k | 60.16% | 90.36% | 95.61% |
 # Citation

 # Model Details
+This model is a Chinese CLIP model trained on [Noah-Wukong Dataset(100M)](https://wukong-dataset.github.io/wukong-dataset/) and [Zero(23M)](https://zero.so.com/). We use ViT-L-14 from [openAI](https://github.com/openai/CLIP) as image encoder and Chinese pre-trained language model  [chinese-roberta-wwm-large](https://huggingface.co/hfl/chinese-roberta-wwm-ext-large) as text encoder. We freeze the image encoder and only finetune the text encoder. The model was first trained 10 epochs on wukong and then train another 12 epochs on wukong and zero.
 # Taiyi (太乙)
 Taiyi models are a branch of the Fengshenbang (封神榜) series of models. The models in Taiyi are pre-trained with multimodal pre-training strategies. We will release more image-text model trained on Chinese dataset and benefit the Chinese community.
 |  model   | dataset  | Top1 | Top5 |
 |  ----  | ----  | ---- | ---- |
+| Taiyi-CLIP-Roberta-326M-Chinese  | ImageNet1k-CN | 53.05% | 79.55% |
 ### Zero-Shot Text-to-Image Retrieval
 |  model   | dataset  | Top1 | Top5 | Top10 |
 |  ----  | ----  | ---- | ---- | ---- |
+| Taiyi-CLIP-Roberta-326M-Chinese  | Flickr30k-CNA-test | 54.36% | 80.56%  | 87.90% |
+| Taiyi-CLIP-Roberta-326M-Chinese  | COCO-CN-test | 51.47% | 81.00%  | 90.40% |
+| Taiyi-CLIP-Roberta-326M-Chinese  | wukong50k | 61.18% | 90.46% | 95.74% |
 # Citation

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ed08a68fa96e00beb89ad18dfe6112c6b57dc85cc9a0234a40bae6e1a58c491d
 size 1305368941

 version https://git-lfs.github.com/spec/v1
+oid sha256:13f50fdd2fa0e809a95d602b4d74552d2c27e3ebc08f40108a7d1cae20a7107b
 size 1305368941