lyua1225
/

clip-huge-zh-75k-steps-bs4096

Zero-Shot Image Classification

Inference Endpoints

Model card Files Files and versions Community

lyua1225 commited on Dec 16, 2022

Commit

8bc3b83

•

1 Parent(s): 55e229d

Upload README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -53,7 +53,7 @@ Text encoder is the same structure as [open_clip/CLIP-VIT-H](https://huggingface
 3. Freeze the entire visual model, text encoder layer as well as the text projection layer. Only the text embedding layer is unfrozen. The purpose of this step is to align chinese word embedding with the original english word embedding such that the final projection latent space  would not drift far away.
 4. After a bunch of steps, unfreeze the entire text encoder for better convergence.
-Notation: We use clip loss to optimize chinese text encoder. Chinese subset of [LAION-5B](https://laion.ai/blog/laion-5b/) are chosen as our training set (around 85M text-image pairs). This model was trained 75k steps with 4096 batch size so it is not completely converged at all.
 ## 使用 Usage

 3. Freeze the entire visual model, text encoder layer as well as the text projection layer. Only the text embedding layer is unfrozen. The purpose of this step is to align chinese word embedding with the original english word embedding such that the final projection latent space  would not drift far away.
 4. After a bunch of steps, unfreeze the entire text encoder for better convergence.
+Note: We use clip loss to optimize chinese text encoder. Chinese subset of [LAION-5B](https://laion.ai/blog/laion-5b/) are chosen as our training set (around 85M text-image pairs). This model was trained 75k steps with 4096 batch size so it is still far away from convergence.
 ## 使用 Usage