Jinkin commited on
Commit
d1d71a7
1 Parent(s): e79b872

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -0
README.md CHANGED
@@ -1056,3 +1056,11 @@ model-index:
1056
  ---
1057
 
1058
  ## piccolo-base-zh
 
 
 
 
 
 
 
 
 
1056
  ---
1057
 
1058
  ## piccolo-base-zh
1059
+
1060
+ piccolo is a general text embedding model, powered by General Model Group from SenseTime Research.
1061
+ Based on BERT framework, piccolo is trained using a two stage pipeline. On the first stage, we collect and crawl 400 million weakly supervised Chinese text pairs from the Internet,
1062
+ and train the model with the pair(text and text pos) softmax contrastive loss.
1063
+ On the second stage, we collect 20 million human labeled chinese text pairs from the open-source dataset, and finetune the model with tiplet (text, text_pos, text_neg) contrastive loss.
1064
+ Currently here we offer two different sizes of models, including piccolo-base-zh, piccolo-large-zh.
1065
+
1066
+