Update README.md
Browse files
README.md
CHANGED
@@ -1060,8 +1060,8 @@ library_name: sentence-transformers
|
|
1060 |
---
|
1061 |
<h2 align="left">ZPoint Large Embedding for Chinese</h2>
|
1062 |
|
1063 |
-
|
1064 |
-
|
1065 |
|
1066 |
### Training Details
|
1067 |
|
@@ -1081,7 +1081,7 @@ library_name: sentence-transformers
|
|
1081 |
- Finally, total amount of synthesized data is about 30 million.
|
1082 |
|
1083 |
3) **Collect more data for retrieval-type tasks**
|
1084 |
-
- We constructed a dataset of approximately 100 million training samples through collection, machine translation, and LLM synthesis. This dataset includes data from various fields such as healthcare, law, electricity, automotive, and 3C (Consumer Electronics)
|
1085 |
- [miracl/miracl](https://huggingface.co/datasets/miracl/miracl)
|
1086 |
- [FreedomIntelligence/Huatuo26M-Lite](https://huggingface.co/datasets/FreedomIntelligence/Huatuo26M-Lite)
|
1087 |
- [PaddlePaddle/dureader_robust](https://huggingface.co/datasets/PaddlePaddle/dureader_robust) **C-MTEB test filtered**
|
|
|
1060 |
---
|
1061 |
<h2 align="left">ZPoint Large Embedding for Chinese</h2>
|
1062 |
|
1063 |
+
- **[2024-06-04]** Release zpoint_large_embedding_zh, and upload model weight to huggingface
|
1064 |
+
- **[2024-06-05]** Add training details
|
1065 |
|
1066 |
### Training Details
|
1067 |
|
|
|
1081 |
- Finally, total amount of synthesized data is about 30 million.
|
1082 |
|
1083 |
3) **Collect more data for retrieval-type tasks**
|
1084 |
+
- ***We constructed a dataset of approximately 100 million training samples through collection, machine translation, and LLM synthesis. This dataset includes data from various fields such as healthcare, law, electricity, automotive, and 3C (Consumer Electronics).***
|
1085 |
- [miracl/miracl](https://huggingface.co/datasets/miracl/miracl)
|
1086 |
- [FreedomIntelligence/Huatuo26M-Lite](https://huggingface.co/datasets/FreedomIntelligence/Huatuo26M-Lite)
|
1087 |
- [PaddlePaddle/dureader_robust](https://huggingface.co/datasets/PaddlePaddle/dureader_robust) **C-MTEB test filtered**
|