Update README.md
Browse files
README.md
CHANGED
@@ -1081,7 +1081,6 @@ library_name: sentence-transformers
|
|
1081 |
- Finally, total amount of synthesized data is about 30 million.
|
1082 |
|
1083 |
3) **Collect more data for retrieval-type tasks**
|
1084 |
-
- ***We constructed a dataset of approximately 100 million training samples through collection, machine translation, and LLM synthesis. This dataset includes data from various fields such as healthcare, law, electricity, automotive, and 3C (Consumer Electronics).***
|
1085 |
- [miracl/miracl](https://huggingface.co/datasets/miracl/miracl)
|
1086 |
- [FreedomIntelligence/Huatuo26M-Lite](https://huggingface.co/datasets/FreedomIntelligence/Huatuo26M-Lite)
|
1087 |
- [PaddlePaddle/dureader_robust](https://huggingface.co/datasets/PaddlePaddle/dureader_robust) **C-MTEB test filtered**
|
@@ -1090,6 +1089,9 @@ library_name: sentence-transformers
|
|
1090 |
- [Shitao/MLDR](https://huggingface.co/datasets/Shitao/MLDR)
|
1091 |
- ...
|
1092 |
|
|
|
|
|
|
|
1093 |
**Training loss**
|
1094 |
1) Multi-Task loss like [Piccolo](https://huggingface.co/sensenova/piccolo-large-zh-v2)
|
1095 |
2) Matryoshka Representation Learning
|
|
|
1081 |
- Finally, total amount of synthesized data is about 30 million.
|
1082 |
|
1083 |
3) **Collect more data for retrieval-type tasks**
|
|
|
1084 |
- [miracl/miracl](https://huggingface.co/datasets/miracl/miracl)
|
1085 |
- [FreedomIntelligence/Huatuo26M-Lite](https://huggingface.co/datasets/FreedomIntelligence/Huatuo26M-Lite)
|
1086 |
- [PaddlePaddle/dureader_robust](https://huggingface.co/datasets/PaddlePaddle/dureader_robust) **C-MTEB test filtered**
|
|
|
1089 |
- [Shitao/MLDR](https://huggingface.co/datasets/Shitao/MLDR)
|
1090 |
- ...
|
1091 |
|
1092 |
+
***We constructed a dataset of approximately 100 million training samples through collection, machine translation, and LLM synthesis. This dataset includes data from various fields such as healthcare, law, electricity, automotive, and 3C (Consumer Electronics).***
|
1093 |
+
|
1094 |
+
|
1095 |
**Training loss**
|
1096 |
1) Multi-Task loss like [Piccolo](https://huggingface.co/sensenova/piccolo-large-zh-v2)
|
1097 |
2) Matryoshka Representation Learning
|