IDEA-CCNL
/

Taiyi-CLIP-Roberta-102M-Chinese

@@ -15,7 +15,7 @@ tags:
 # Model Details
-This model is a Chinese CLIP model trained on [Noah-Wukong Dataset](https://wukong-dataset.github.io/wukong-dataset/), which contains about 100M Chinese image-text pairs. We use ViT-B-32 from [openAI](https://github.com/openai/CLIP) as image encoder and Chinese pre-trained language model  [chinese-roberta-wwm](https://huggingface.co/hfl/chinese-roberta-wwm-ext) as text encoder. We freeze the image encoder and only finetune the text encoder. The model was trained for 20 epochs and it takes about 10 days with 8 A100 GPUs.
 # Taiyi (太乙)
 Taiyi models are a branch of the Fengshenbang (封神榜) series of models. The models in Taiyi are pre-trained with multimodal pre-training strategies. We will release more image-text model trained on Chinese dataset and benefit the Chinese community.
@@ -65,14 +65,14 @@ with torch.no_grad():
 ### Zero-Shot Classification
 |  model   | dataset  | Top1 | Top5 |
 |  ----  | ----  | ---- | ---- |
-| Taiyi-CLIP-Roberta-102M-Chinese  | ImageNet1k-CN | 41.00% | 69.19% |
 ### Zero-Shot Text-to-Image Retrieval
 |  model   | dataset  | Top1 | Top5 | Top10 |
 |  ----  | ----  | ---- | ---- | ---- |
-| Taiyi-CLIP-Roberta-102M-Chinese  | Flickr30k-CNA-test | 44.06% | 71.42%  | 80.84% |
-| Taiyi-CLIP-Roberta-102M-Chinese  | COCO-CN-test | 46.24% | 78.06%  | 88.88% |
 | Taiyi-CLIP-Roberta-102M-Chinese  | wukong50k | 48.67% | 81.77% | 90.09% |

 # Model Details
+This model is a Chinese CLIP model trained on [Noah-Wukong Dataset(100M)](https://wukong-dataset.github.io/wukong-dataset/) and [Zero(23M)](https://zero.so.com/). We use ViT-B-32 from [openAI](https://github.com/openai/CLIP) as image encoder and Chinese pre-trained language model  [chinese-roberta-wwm](https://huggingface.co/hfl/chinese-roberta-wwm-ext) as text encoder. We freeze the image encoder and only finetune the text encoder. The model was trained for 24 epochs and it takes about 10 days with 16 A100 GPUs.
 # Taiyi (太乙)
 Taiyi models are a branch of the Fengshenbang (封神榜) series of models. The models in Taiyi are pre-trained with multimodal pre-training strategies. We will release more image-text model trained on Chinese dataset and benefit the Chinese community.
 ### Zero-Shot Classification
 |  model   | dataset  | Top1 | Top5 |
 |  ----  | ----  | ---- | ---- |
+| Taiyi-CLIP-Roberta-102M-Chinese  | ImageNet1k-CN | 42.85% | 71.48% |
 ### Zero-Shot Text-to-Image Retrieval
 |  model   | dataset  | Top1 | Top5 | Top10 |
 |  ----  | ----  | ---- | ---- | ---- |
+| Taiyi-CLIP-Roberta-102M-Chinese  | Flickr30k-CNA-test | 46.32% | 74.58%  | 83.44% |
+| Taiyi-CLIP-Roberta-102M-Chinese  | COCO-CN-test | 47.10% | 78.53%  | 87.84% |
 | Taiyi-CLIP-Roberta-102M-Chinese  | wukong50k | 48.67% | 81.77% | 90.09% |

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:53ec5505ee1ce25f970c5ce488bbd49b5727c36faa2132de0f2cf82dddbf3e37
 size 410713709

 version https://git-lfs.github.com/spec/v1
+oid sha256:d679dcce5801d600bce716e1fa3e13508812b9cb4ff0ff6101d12a96b3a4eae9
 size 410713709