iampanda
/

zpoint_large_embedding_zh

sentence-transformers

Model card Files Files and versions Community

iampanda commited on Jun 5, 2024

Commit

7c9c9c2

·

verified ·

1 Parent(s): 58e695a

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -1081,7 +1081,6 @@ library_name: sentence-transformers
 - Finally, total amount of synthesized data is about 30 million.
 3) **Collect more data for retrieval-type tasks**
-- ***We constructed a dataset of approximately 100 million training samples through collection, machine translation, and LLM synthesis. This dataset includes data from various fields such as healthcare, law, electricity, automotive, and 3C (Consumer Electronics).***
 - [miracl/miracl](https://huggingface.co/datasets/miracl/miracl)
 - [FreedomIntelligence/Huatuo26M-Lite](https://huggingface.co/datasets/FreedomIntelligence/Huatuo26M-Lite)
 - [PaddlePaddle/dureader_robust](https://huggingface.co/datasets/PaddlePaddle/dureader_robust) **C-MTEB test filtered**
@@ -1090,6 +1089,9 @@ library_name: sentence-transformers
 - [Shitao/MLDR](https://huggingface.co/datasets/Shitao/MLDR)
 - ...
 **Training loss**
 1) Multi-Task loss like [Piccolo](https://huggingface.co/sensenova/piccolo-large-zh-v2)
 2) Matryoshka Representation Learning

 - Finally, total amount of synthesized data is about 30 million.
 3) **Collect more data for retrieval-type tasks**
 - [miracl/miracl](https://huggingface.co/datasets/miracl/miracl)
 - [FreedomIntelligence/Huatuo26M-Lite](https://huggingface.co/datasets/FreedomIntelligence/Huatuo26M-Lite)
 - [PaddlePaddle/dureader_robust](https://huggingface.co/datasets/PaddlePaddle/dureader_robust) **C-MTEB test filtered**
 - [Shitao/MLDR](https://huggingface.co/datasets/Shitao/MLDR)
 - ...
+***We constructed a dataset of approximately 100 million training samples through collection, machine translation, and LLM synthesis. This dataset includes data from various fields such as healthcare, law, electricity, automotive, and 3C (Consumer Electronics).***
 **Training loss**
 1) Multi-Task loss like [Piccolo](https://huggingface.co/sensenova/piccolo-large-zh-v2)
 2) Matryoshka Representation Learning