Upload README.md
Browse files
README.md
CHANGED
@@ -53,7 +53,7 @@ Text encoder is the same structure as [open_clip/CLIP-VIT-H](https://huggingface
|
|
53 |
3. Freeze the entire visual model, text encoder layer as well as the text projection layer. Only the text embedding layer is unfrozen. The purpose of this step is to align chinese word embedding with the original english word embedding such that the final projection latent space would not drift far away.
|
54 |
4. After a bunch of steps, unfreeze the entire text encoder for better convergence.
|
55 |
|
56 |
-
|
57 |
|
58 |
|
59 |
## 使用 Usage
|
|
|
53 |
3. Freeze the entire visual model, text encoder layer as well as the text projection layer. Only the text embedding layer is unfrozen. The purpose of this step is to align chinese word embedding with the original english word embedding such that the final projection latent space would not drift far away.
|
54 |
4. After a bunch of steps, unfreeze the entire text encoder for better convergence.
|
55 |
|
56 |
+
Note: We use clip loss to optimize chinese text encoder. Chinese subset of [LAION-5B](https://laion.ai/blog/laion-5b/) are chosen as our training set (around 85M text-image pairs). This model was trained 75k steps with 4096 batch size so it is still far away from convergence.
|
57 |
|
58 |
|
59 |
## 使用 Usage
|