weifeng commited on
Commit
ab347b1
1 Parent(s): 98ccbcd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -32,9 +32,9 @@ The first open source Chinese CLIP, pre-training on 123M image-text pairs, the t
32
 
33
  ## 模型信息 Model Information
34
 
35
- 我们遵循CLIP的实验设置,以获得强大的视觉-语言表征。在训练中文版的CLIP时,我们使用[chinese-roberta-wwm](https://huggingface.co/hfl/chinese-roberta-wwm-ext)作为语言的编码器,并将[CLIP](https://github.com/openai/CLIP)中的ViT-B-32应用于视觉的编码器。为了快速且稳定地进行预训练,我们冻结了视觉编码器并且只微调语言编码器。此外,我们将[Noah-Wukong](https://wukong-dataset.github.io/wukong-dataset/)数据集(100M)和[Zero](https://zero.so.com/)数据集(23M)用作预训练的数据集。据我们所知,我们的Taiyi-CLIP是目前Huggingface社区中首个的开源中文CLIP。
36
 
37
- We follow the experimental setup of CLIP to obtain powerful visual-language intelligence. To obtain the CLIP for Chinese, we employ [chinese-roberta-wwm](https://huggingface.co/hfl/chinese-roberta-wwm-ext) for the language encoder, and apply the ViT-B-32 in [CLIP](https://github.com/openai/CLIP) for the vision encoder. We freeze the vision encoder and tune the language encoder to speed up and stabilize the pre-training process. Moreover, we apply [Noah-Wukong](https://wukong-dataset.github.io/wukong-dataset/) dataset (100M) and [Zero](https://zero.so.com/) dataset (23M) as the pre-training datasets. To the best of our knowledge, our TaiyiCLIP is currently the only open-sourced Chinese CLIP in the huggingface community.
38
 
39
  ### 下游效果 Performance
40
 
 
32
 
33
  ## 模型信息 Model Information
34
 
35
+ 我们遵循CLIP的实验设置,以获得强大的视觉-语言表征。在训练中文版的CLIP时,我们使用[chinese-roberta-wwm](https://huggingface.co/hfl/chinese-roberta-wwm-ext)作为语言的编码器,并将[CLIP](https://github.com/openai/CLIP)中的ViT-B-32应用于视觉的编码器。为了快速且稳定地进行预训练,我们冻结了视觉编码器并且只微调语言编码器。此外,我们将[Noah-Wukong](https://wukong-dataset.github.io/wukong-dataset/)数据集(100M)和[Zero](https://zero.so.com/)数据集(23M)用作预训练的数据集,训练了24个epoch,在在A100x32上训练了7天。据我们所知,我们的Taiyi-CLIP是目前Huggingface社区中首个的开源中文CLIP。
36
 
37
+ We follow the experimental setup of CLIP to obtain powerful visual-language intelligence. To obtain the CLIP for Chinese, we employ [chinese-roberta-wwm](https://huggingface.co/hfl/chinese-roberta-wwm-ext) for the language encoder, and apply the ViT-B-32 in [CLIP](https://github.com/openai/CLIP) for the vision encoder. We freeze the vision encoder and tune the language encoder to speed up and stabilize the pre-training process. Moreover, we apply [Noah-Wukong](https://wukong-dataset.github.io/wukong-dataset/) dataset (100M) and [Zero](https://zero.so.com/) dataset (23M) as the pre-training datasets. We train 24 epochs, which takes 7 days to train on A100x16. To the best of our knowledge, our TaiyiCLIP is currently the only open-sourced Chinese CLIP in the huggingface community.
38
 
39
  ### 下游效果 Performance
40