weifeng-chen commited on
Commit
ebf1867
1 Parent(s): d4a825f

add zero and achieve better result

Browse files
Files changed (2) hide show
  1. README.md +5 -6
  2. pytorch_model.bin +1 -1
README.md CHANGED
@@ -15,8 +15,7 @@ tags:
15
 
16
  # Model Details
17
 
18
- This model is a Chinese CLIP model trained on [Noah-Wukong Dataset](https://wukong-dataset.github.io/wukong-dataset/), which contains about 100M Chinese image-text pairs. We use ViT-L-14 from [openAI](https://github.com/openai/CLIP) as image encoder and Chinese pre-trained language model [chinese-roberta-wwm-large](https://huggingface.co/hfl/chinese-roberta-wwm-ext-large) as text encoder. We freeze the image encoder and only finetune the text encoder. The model was trained for 10 epochs and it takes about 5 days with 16 A100 GPUs. **This is a beta version, We will continueously update this model**
19
-
20
  # Taiyi (太乙)
21
  Taiyi models are a branch of the Fengshenbang (封神榜) series of models. The models in Taiyi are pre-trained with multimodal pre-training strategies. We will release more image-text model trained on Chinese dataset and benefit the Chinese community.
22
 
@@ -65,15 +64,15 @@ with torch.no_grad():
65
 
66
  | model | dataset | Top1 | Top5 |
67
  | ---- | ---- | ---- | ---- |
68
- | Taiyi-CLIP-Roberta-326M-Chinese | ImageNet1k-CN | 51.72% | 78.46% |
69
 
70
  ### Zero-Shot Text-to-Image Retrieval
71
 
72
  | model | dataset | Top1 | Top5 | Top10 |
73
  | ---- | ---- | ---- | ---- | ---- |
74
- | Taiyi-CLIP-Roberta-326M-Chinese | Flickr30k-CNA-test | 51.08% | 78.20% | 85.94% |
75
- | Taiyi-CLIP-Roberta-326M-Chinese | COCO-CN-test | 52.61% | 80.34% | 89.55% |
76
- | Taiyi-CLIP-Roberta-326M-Chinese | wukong50k | 60.16% | 90.36% | 95.61% |
77
 
78
 
79
  # Citation
 
15
 
16
  # Model Details
17
 
18
+ This model is a Chinese CLIP model trained on [Noah-Wukong Dataset(100M)](https://wukong-dataset.github.io/wukong-dataset/) and [Zero(23M)](https://zero.so.com/). We use ViT-L-14 from [openAI](https://github.com/openai/CLIP) as image encoder and Chinese pre-trained language model [chinese-roberta-wwm-large](https://huggingface.co/hfl/chinese-roberta-wwm-ext-large) as text encoder. We freeze the image encoder and only finetune the text encoder. The model was first trained 10 epochs on wukong and then train another 12 epochs on wukong and zero.
 
19
  # Taiyi (太乙)
20
  Taiyi models are a branch of the Fengshenbang (封神榜) series of models. The models in Taiyi are pre-trained with multimodal pre-training strategies. We will release more image-text model trained on Chinese dataset and benefit the Chinese community.
21
 
 
64
 
65
  | model | dataset | Top1 | Top5 |
66
  | ---- | ---- | ---- | ---- |
67
+ | Taiyi-CLIP-Roberta-326M-Chinese | ImageNet1k-CN | 53.05% | 79.55% |
68
 
69
  ### Zero-Shot Text-to-Image Retrieval
70
 
71
  | model | dataset | Top1 | Top5 | Top10 |
72
  | ---- | ---- | ---- | ---- | ---- |
73
+ | Taiyi-CLIP-Roberta-326M-Chinese | Flickr30k-CNA-test | 54.36% | 80.56% | 87.90% |
74
+ | Taiyi-CLIP-Roberta-326M-Chinese | COCO-CN-test | 51.47% | 81.00% | 90.40% |
75
+ | Taiyi-CLIP-Roberta-326M-Chinese | wukong50k | 61.18% | 90.46% | 95.74% |
76
 
77
 
78
  # Citation
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ed08a68fa96e00beb89ad18dfe6112c6b57dc85cc9a0234a40bae6e1a58c491d
3
  size 1305368941
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:13f50fdd2fa0e809a95d602b4d74552d2c27e3ebc08f40108a7d1cae20a7107b
3
  size 1305368941