sensenova
/

piccolo-base-zh

Feature Extraction

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

Jinkin commited on Sep 5, 2023

Commit

736658f

•

1 Parent(s): 680ab9f

Update README.md

Files changed (1) hide show

README.md +12 -0

README.md CHANGED Viewed

@@ -1068,4 +1068,16 @@ and train the model with the pair(text and text pos) softmax contrastive loss.
 On the second stage, we collect 20 million human labeled chinese text pairs from the open-source dataset, and finetune the model with tiplet (text, text_pos, text_neg) contrastive loss.
 Currently here we offer two different sizes of models, including piccolo-base-zh, piccolo-large-zh.

 On the second stage, we collect 20 million human labeled chinese text pairs from the open-source dataset, and finetune the model with tiplet (text, text_pos, text_neg) contrastive loss.
 Currently here we offer two different sizes of models, including piccolo-base-zh, piccolo-large-zh.
+## Metric
+我们将piccolo与其他的开源embedding模型在CMTEB榜单上进行了比较，请参考CMTEB榜单。我们在eval文件夹中提供了复现结果的脚本。
+We compared the performance of the piccolo with other embedding models on the C-MTEB benchmark. please refer to the C-MTEB leaderboard. wo provide scripts in "eval" folder for results reproducing.
+| Model Name | Model Size (GB) | Dimension | Sequence Length | Average (35) | Classification (9) | Clustering (4) | Pair Classification (2) | Reranking (4) | Retrieval (8) | STS (8) |
+|:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+| [**piccolo-large-zh**] | 0.65 | 1024 | 512 | **64.11** | 67.03 | 47.04 | 78.38 | 65.98 | 70.93 | 58.02 |
+| [bge-large-zh]| 1.3 | 1024| 512 | 63.96 | 68.32 | 48.39 | 78.94 | 65.11 | 71.52 | 54.98 |
+| [**piccolo-base-zh**]| 0.2 | 768 | 512 | **63.66** | 66.98 | 47.12 | 76.61 | 66.68 | 71.2 | 55.9 |
+| [bge-large-zh-no-instruct]| 1.3 | 1024 | 512 | 63.4 | 68.58 | 50.01 | 76.77 | 64.9 | 70.54 | 53 |
+| [bge-base-zh]| 0.41 | 768 | 512 | 62.8 | 67.07 | 47.64 | 77.5 | 64.91 | 69.53 | 54.12 |