--- tags: - mteb model-index: - name: piccolo-large-zh-v2 results: - task: type: STS dataset: type: C-MTEB/AFQMC name: MTEB AFQMC config: default split: validation revision: None metrics: - type: cos_sim_pearson value: 56.76055988260572 - type: cos_sim_spearman value: 61.49271876861677 - type: euclidean_pearson value: 59.14524585320711 - type: euclidean_spearman value: 60.63579339225774 - type: manhattan_pearson value: 59.14662752965445 - type: manhattan_spearman value: 60.635190265737904 - task: type: STS dataset: type: C-MTEB/ATEC name: MTEB ATEC config: default split: test revision: None metrics: - type: cos_sim_pearson value: 56.21706298831197 - type: cos_sim_spearman value: 59.19831457688953 - type: euclidean_pearson value: 62.37752017633299 - type: euclidean_spearman value: 58.79400967473204 - type: manhattan_pearson value: 62.37015943212308 - type: manhattan_spearman value: 58.79232537600814 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (zh) config: zh split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 49.440000000000005 - type: f1 value: 46.67381446305019 - task: type: STS dataset: type: C-MTEB/BQ name: MTEB BQ config: default split: test revision: None metrics: - type: cos_sim_pearson value: 70.99026329599994 - type: cos_sim_spearman value: 72.87565357908989 - type: euclidean_pearson value: 71.17690439270028 - type: euclidean_spearman value: 72.50428109969029 - type: manhattan_pearson value: 71.17262321033088 - type: manhattan_spearman value: 72.49845447987437 - task: type: Clustering dataset: type: C-MTEB/CLSClusteringP2P name: MTEB CLSClusteringP2P config: default split: test revision: None metrics: - type: v_measure value: 57.92713421071616 - task: type: Clustering dataset: type: C-MTEB/CLSClusteringS2S name: MTEB CLSClusteringS2S config: default split: test revision: None metrics: - type: v_measure value: 48.096546680932235 - task: type: Reranking dataset: type: C-MTEB/CMedQAv1-reranking name: MTEB CMedQAv1 config: default split: test revision: None metrics: - type: map value: 89.31003741715936 - type: mrr value: 91.38075396825397 - task: type: Reranking dataset: type: C-MTEB/CMedQAv2-reranking name: MTEB CMedQAv2 config: default split: test revision: None metrics: - type: map value: 90.13769781784876 - type: mrr value: 92.14329365079365 - task: type: Retrieval dataset: type: C-MTEB/CmedqaRetrieval name: MTEB CmedqaRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 26.931 - type: map_at_10 value: 40.647 - type: map_at_100 value: 42.519 - type: map_at_1000 value: 42.616 - type: map_at_3 value: 36.144999999999996 - type: map_at_5 value: 38.717 - type: mrr_at_1 value: 40.935 - type: mrr_at_10 value: 49.684 - type: mrr_at_100 value: 50.598 - type: mrr_at_1000 value: 50.632999999999996 - type: mrr_at_3 value: 47.07 - type: mrr_at_5 value: 48.49 - type: ndcg_at_1 value: 40.935 - type: ndcg_at_10 value: 47.583999999999996 - type: ndcg_at_100 value: 54.69199999999999 - type: ndcg_at_1000 value: 56.314 - type: ndcg_at_3 value: 41.973 - type: ndcg_at_5 value: 44.334 - type: precision_at_1 value: 40.935 - type: precision_at_10 value: 10.585 - type: precision_at_100 value: 1.637 - type: precision_at_1000 value: 0.184 - type: precision_at_3 value: 23.881 - type: precision_at_5 value: 17.399 - type: recall_at_1 value: 26.931 - type: recall_at_10 value: 59.006 - type: recall_at_100 value: 88.247 - type: recall_at_1000 value: 99.045 - type: recall_at_3 value: 42.064 - type: recall_at_5 value: 49.266 - task: type: PairClassification dataset: type: C-MTEB/CMNLI name: MTEB Cmnli config: default split: validation revision: None metrics: - type: cos_sim_accuracy value: 86.08538785327721 - type: cos_sim_ap value: 92.64373114205229 - type: cos_sim_f1 value: 86.89951395953432 - type: cos_sim_precision value: 84.11378555798687 - type: cos_sim_recall value: 89.87608136544307 - type: dot_accuracy value: 72.66386049308478 - type: dot_ap value: 81.053422935767 - type: dot_f1 value: 75.19933726830277 - type: dot_precision value: 67.4907063197026 - type: dot_recall value: 84.89595510872107 - type: euclidean_accuracy value: 85.52014431749849 - type: euclidean_ap value: 91.90647782899615 - type: euclidean_f1 value: 86.26361413647477 - type: euclidean_precision value: 82.2071595001059 - type: euclidean_recall value: 90.74117371989713 - type: manhattan_accuracy value: 85.48406494287433 - type: manhattan_ap value: 91.89657919524385 - type: manhattan_f1 value: 86.20413761572752 - type: manhattan_precision value: 84.324686940966 - type: manhattan_recall value: 88.16927753097966 - type: max_accuracy value: 86.08538785327721 - type: max_ap value: 92.64373114205229 - type: max_f1 value: 86.89951395953432 - task: type: Retrieval dataset: type: C-MTEB/CovidRetrieval name: MTEB CovidRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 75.50099999999999 - type: map_at_10 value: 83.43 - type: map_at_100 value: 83.577 - type: map_at_1000 value: 83.57900000000001 - type: map_at_3 value: 82.06400000000001 - type: map_at_5 value: 82.88600000000001 - type: mrr_at_1 value: 75.869 - type: mrr_at_10 value: 83.536 - type: mrr_at_100 value: 83.682 - type: mrr_at_1000 value: 83.68299999999999 - type: mrr_at_3 value: 82.244 - type: mrr_at_5 value: 82.998 - type: ndcg_at_1 value: 75.764 - type: ndcg_at_10 value: 86.777 - type: ndcg_at_100 value: 87.36 - type: ndcg_at_1000 value: 87.424 - type: ndcg_at_3 value: 84.10300000000001 - type: ndcg_at_5 value: 85.532 - type: precision_at_1 value: 75.764 - type: precision_at_10 value: 9.8 - type: precision_at_100 value: 1.005 - type: precision_at_1000 value: 0.101 - type: precision_at_3 value: 30.207 - type: precision_at_5 value: 18.82 - type: recall_at_1 value: 75.50099999999999 - type: recall_at_10 value: 96.997 - type: recall_at_100 value: 99.473 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 89.831 - type: recall_at_5 value: 93.256 - task: type: Retrieval dataset: type: C-MTEB/DuRetrieval name: MTEB DuRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 27.094 - type: map_at_10 value: 82.418 - type: map_at_100 value: 85.05 - type: map_at_1000 value: 85.083 - type: map_at_3 value: 57.68600000000001 - type: map_at_5 value: 72.476 - type: mrr_at_1 value: 92.25 - type: mrr_at_10 value: 94.621 - type: mrr_at_100 value: 94.675 - type: mrr_at_1000 value: 94.677 - type: mrr_at_3 value: 94.375 - type: mrr_at_5 value: 94.52199999999999 - type: ndcg_at_1 value: 92.25 - type: ndcg_at_10 value: 89.13600000000001 - type: ndcg_at_100 value: 91.532 - type: ndcg_at_1000 value: 91.836 - type: ndcg_at_3 value: 88.50099999999999 - type: ndcg_at_5 value: 87.251 - type: precision_at_1 value: 92.25 - type: precision_at_10 value: 42.295 - type: precision_at_100 value: 4.812 - type: precision_at_1000 value: 0.48900000000000005 - type: precision_at_3 value: 79.167 - type: precision_at_5 value: 66.56 - type: recall_at_1 value: 27.094 - type: recall_at_10 value: 89.816 - type: recall_at_100 value: 97.855 - type: recall_at_1000 value: 99.384 - type: recall_at_3 value: 59.557 - type: recall_at_5 value: 76.395 - task: type: Retrieval dataset: type: C-MTEB/EcomRetrieval name: MTEB EcomRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 53.6 - type: map_at_10 value: 62.985 - type: map_at_100 value: 63.532999999999994 - type: map_at_1000 value: 63.546 - type: map_at_3 value: 60.617 - type: map_at_5 value: 62.017 - type: mrr_at_1 value: 53.6 - type: mrr_at_10 value: 62.985 - type: mrr_at_100 value: 63.532999999999994 - type: mrr_at_1000 value: 63.546 - type: mrr_at_3 value: 60.617 - type: mrr_at_5 value: 62.017 - type: ndcg_at_1 value: 53.6 - type: ndcg_at_10 value: 67.755 - type: ndcg_at_100 value: 70.366 - type: ndcg_at_1000 value: 70.696 - type: ndcg_at_3 value: 62.89900000000001 - type: ndcg_at_5 value: 65.437 - type: precision_at_1 value: 53.6 - type: precision_at_10 value: 8.28 - type: precision_at_100 value: 0.9490000000000001 - type: precision_at_1000 value: 0.098 - type: precision_at_3 value: 23.166999999999998 - type: precision_at_5 value: 15.14 - type: recall_at_1 value: 53.6 - type: recall_at_10 value: 82.8 - type: recall_at_100 value: 94.89999999999999 - type: recall_at_1000 value: 97.5 - type: recall_at_3 value: 69.5 - type: recall_at_5 value: 75.7 - task: type: Classification dataset: type: C-MTEB/IFlyTek-classification name: MTEB IFlyTek config: default split: validation revision: None metrics: - type: accuracy value: 52.104655636783384 - type: f1 value: 41.025743582860514 - task: type: Classification dataset: type: C-MTEB/JDReview-classification name: MTEB JDReview config: default split: test revision: None metrics: - type: accuracy value: 88.57410881801127 - type: ap value: 59.49612312498937 - type: f1 value: 83.70595013666741 - task: type: STS dataset: type: C-MTEB/LCQMC name: MTEB LCQMC config: default split: test revision: None metrics: - type: cos_sim_pearson value: 74.00327736048256 - type: cos_sim_spearman value: 79.5459672237356 - type: euclidean_pearson value: 79.18300205389669 - type: euclidean_spearman value: 79.21872988987533 - type: manhattan_pearson value: 79.1715470733081 - type: manhattan_spearman value: 79.20756273498812 - task: type: Retrieval dataset: type: C-MTEB/MMarcoRetrieval name: MTEB MMarcoRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 66.94600000000001 - type: map_at_10 value: 75.947 - type: map_at_100 value: 76.268 - type: map_at_1000 value: 76.28 - type: map_at_3 value: 74.13300000000001 - type: map_at_5 value: 75.28399999999999 - type: mrr_at_1 value: 69.241 - type: mrr_at_10 value: 76.532 - type: mrr_at_100 value: 76.816 - type: mrr_at_1000 value: 76.827 - type: mrr_at_3 value: 74.95 - type: mrr_at_5 value: 75.957 - type: ndcg_at_1 value: 69.241 - type: ndcg_at_10 value: 79.54299999999999 - type: ndcg_at_100 value: 80.95 - type: ndcg_at_1000 value: 81.252 - type: ndcg_at_3 value: 76.119 - type: ndcg_at_5 value: 78.069 - type: precision_at_1 value: 69.241 - type: precision_at_10 value: 9.576 - type: precision_at_100 value: 1.026 - type: precision_at_1000 value: 0.105 - type: precision_at_3 value: 28.571999999999996 - type: precision_at_5 value: 18.181 - type: recall_at_1 value: 66.94600000000001 - type: recall_at_10 value: 90.024 - type: recall_at_100 value: 96.3 - type: recall_at_1000 value: 98.656 - type: recall_at_3 value: 81.026 - type: recall_at_5 value: 85.658 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-CN) config: zh-CN split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 77.71015467383997 - type: f1 value: 74.32345894845358 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-CN) config: zh-CN split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 85.63214525891055 - type: f1 value: 84.65303466003252 - task: type: Retrieval dataset: type: C-MTEB/MedicalRetrieval name: MTEB MedicalRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 55.50000000000001 - type: map_at_10 value: 61.66199999999999 - type: map_at_100 value: 62.13999999999999 - type: map_at_1000 value: 62.187000000000005 - type: map_at_3 value: 59.967000000000006 - type: map_at_5 value: 60.927 - type: mrr_at_1 value: 55.7 - type: mrr_at_10 value: 61.76199999999999 - type: mrr_at_100 value: 62.241 - type: mrr_at_1000 value: 62.287000000000006 - type: mrr_at_3 value: 60.06700000000001 - type: mrr_at_5 value: 61.027 - type: ndcg_at_1 value: 55.50000000000001 - type: ndcg_at_10 value: 64.878 - type: ndcg_at_100 value: 67.464 - type: ndcg_at_1000 value: 68.745 - type: ndcg_at_3 value: 61.367000000000004 - type: ndcg_at_5 value: 63.117999999999995 - type: precision_at_1 value: 55.50000000000001 - type: precision_at_10 value: 7.51 - type: precision_at_100 value: 0.878 - type: precision_at_1000 value: 0.098 - type: precision_at_3 value: 21.8 - type: precision_at_5 value: 13.94 - type: recall_at_1 value: 55.50000000000001 - type: recall_at_10 value: 75.1 - type: recall_at_100 value: 87.8 - type: recall_at_1000 value: 97.89999999999999 - type: recall_at_3 value: 65.4 - type: recall_at_5 value: 69.69999999999999 - task: type: Reranking dataset: type: C-MTEB/Mmarco-reranking name: MTEB MMarcoReranking config: default split: dev revision: None metrics: - type: map value: 33.386980266936106 - type: mrr value: 32.11904761904762 - task: type: Classification dataset: type: C-MTEB/MultilingualSentiment-classification name: MTEB MultilingualSentiment config: default split: validation revision: None metrics: - type: accuracy value: 79.08666666666666 - type: f1 value: 78.93142205976953 - task: type: PairClassification dataset: type: C-MTEB/OCNLI name: MTEB Ocnli config: default split: validation revision: None metrics: - type: cos_sim_accuracy value: 84.35300487276665 - type: cos_sim_ap value: 87.83572265803564 - type: cos_sim_f1 value: 85.42713567839195 - type: cos_sim_precision value: 81.49568552253116 - type: cos_sim_recall value: 89.7571277719113 - type: dot_accuracy value: 72.87493232268544 - type: dot_ap value: 80.29032993894747 - type: dot_f1 value: 76.5938475256353 - type: dot_precision value: 66.28086419753086 - type: dot_recall value: 90.70749736008447 - type: euclidean_accuracy value: 82.34975636166757 - type: euclidean_ap value: 85.73873757468064 - type: euclidean_f1 value: 83.56713426853707 - type: euclidean_precision value: 79.50428979980934 - type: euclidean_recall value: 88.0675818373812 - type: manhattan_accuracy value: 82.45804006497022 - type: manhattan_ap value: 85.7176464290469 - type: manhattan_f1 value: 83.65095285857572 - type: manhattan_precision value: 79.65616045845272 - type: manhattan_recall value: 88.0675818373812 - type: max_accuracy value: 84.35300487276665 - type: max_ap value: 87.83572265803564 - type: max_f1 value: 85.42713567839195 - task: type: Classification dataset: type: C-MTEB/OnlineShopping-classification name: MTEB OnlineShopping config: default split: test revision: None metrics: - type: accuracy value: 94.61999999999999 - type: ap value: 92.74140430219491 - type: f1 value: 94.60775857122515 - task: type: STS dataset: type: C-MTEB/PAWSX name: MTEB PAWSX config: default split: test revision: None metrics: - type: cos_sim_pearson value: 39.75749234575995 - type: cos_sim_spearman value: 46.48035295363829 - type: euclidean_pearson value: 45.38711981599582 - type: euclidean_spearman value: 46.13915356562481 - type: manhattan_pearson value: 45.420770530489065 - type: manhattan_spearman value: 46.179913441143775 - task: type: STS dataset: type: C-MTEB/QBQTC name: MTEB QBQTC config: default split: test revision: None metrics: - type: cos_sim_pearson value: 44.02008249965321 - type: cos_sim_spearman value: 45.906917552219156 - type: euclidean_pearson value: 36.600317631983316 - type: euclidean_spearman value: 41.97740958824762 - type: manhattan_pearson value: 36.54329048509785 - type: manhattan_spearman value: 41.91222171040451 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (zh) config: zh split: test revision: None metrics: - type: cos_sim_pearson value: 60.97044608578288 - type: cos_sim_spearman value: 63.76187490245927 - type: euclidean_pearson value: 60.74245987426317 - type: euclidean_spearman value: 63.32990713078846 - type: manhattan_pearson value: 60.62422616577702 - type: manhattan_spearman value: 63.256612476686826 - task: type: STS dataset: type: C-MTEB/STSB name: MTEB STSB config: default split: test revision: None metrics: - type: cos_sim_pearson value: 76.28185867362305 - type: cos_sim_spearman value: 78.71478656159289 - type: euclidean_pearson value: 79.80734359535234 - type: euclidean_spearman value: 79.85403491297063 - type: manhattan_pearson value: 79.79454037962215 - type: manhattan_spearman value: 79.82796402623201 - task: type: Reranking dataset: type: C-MTEB/T2Reranking name: MTEB T2Reranking config: default split: dev revision: None metrics: - type: map value: 67.14759526113295 - type: mrr value: 77.36422096484723 - task: type: Retrieval dataset: type: C-MTEB/T2Retrieval name: MTEB T2Retrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 28.177999999999997 - type: map_at_10 value: 78.77199999999999 - type: map_at_100 value: 82.365 - type: map_at_1000 value: 82.422 - type: map_at_3 value: 55.452999999999996 - type: map_at_5 value: 68.12700000000001 - type: mrr_at_1 value: 91.097 - type: mrr_at_10 value: 93.52000000000001 - type: mrr_at_100 value: 93.587 - type: mrr_at_1000 value: 93.589 - type: mrr_at_3 value: 93.136 - type: mrr_at_5 value: 93.381 - type: ndcg_at_1 value: 91.097 - type: ndcg_at_10 value: 86.136 - type: ndcg_at_100 value: 89.515 - type: ndcg_at_1000 value: 90.049 - type: ndcg_at_3 value: 87.41600000000001 - type: ndcg_at_5 value: 86.115 - type: precision_at_1 value: 91.097 - type: precision_at_10 value: 42.597 - type: precision_at_100 value: 5.043 - type: precision_at_1000 value: 0.517 - type: precision_at_3 value: 76.239 - type: precision_at_5 value: 63.93 - type: recall_at_1 value: 28.177999999999997 - type: recall_at_10 value: 85.182 - type: recall_at_100 value: 96.174 - type: recall_at_1000 value: 98.848 - type: recall_at_3 value: 57.150999999999996 - type: recall_at_5 value: 71.50999999999999 - task: type: Classification dataset: type: C-MTEB/TNews-classification name: MTEB TNews config: default split: validation revision: None metrics: - type: accuracy value: 54.521 - type: f1 value: 52.53528052282081 - task: type: Clustering dataset: type: C-MTEB/ThuNewsClusteringP2P name: MTEB ThuNewsClusteringP2P config: default split: test revision: None metrics: - type: v_measure value: 74.2003249023509 - task: type: Clustering dataset: type: C-MTEB/ThuNewsClusteringS2S name: MTEB ThuNewsClusteringS2S config: default split: test revision: None metrics: - type: v_measure value: 68.4277378629746 - task: type: Retrieval dataset: type: C-MTEB/VideoRetrieval name: MTEB VideoRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 58.599999999999994 - type: map_at_10 value: 68.671 - type: map_at_100 value: 69.148 - type: map_at_1000 value: 69.157 - type: map_at_3 value: 66.9 - type: map_at_5 value: 68.045 - type: mrr_at_1 value: 58.599999999999994 - type: mrr_at_10 value: 68.671 - type: mrr_at_100 value: 69.148 - type: mrr_at_1000 value: 69.157 - type: mrr_at_3 value: 66.9 - type: mrr_at_5 value: 68.045 - type: ndcg_at_1 value: 58.599999999999994 - type: ndcg_at_10 value: 73.099 - type: ndcg_at_100 value: 75.33 - type: ndcg_at_1000 value: 75.58500000000001 - type: ndcg_at_3 value: 69.502 - type: ndcg_at_5 value: 71.542 - type: precision_at_1 value: 58.599999999999994 - type: precision_at_10 value: 8.68 - type: precision_at_100 value: 0.97 - type: precision_at_1000 value: 0.099 - type: precision_at_3 value: 25.667 - type: precision_at_5 value: 16.38 - type: recall_at_1 value: 58.599999999999994 - type: recall_at_10 value: 86.8 - type: recall_at_100 value: 97.0 - type: recall_at_1000 value: 99.1 - type: recall_at_3 value: 77.0 - type: recall_at_5 value: 81.89999999999999 - task: type: Classification dataset: type: C-MTEB/waimai-classification name: MTEB Waimai config: default split: test revision: None metrics: - type: accuracy value: 89.58999999999999 - type: ap value: 75.69899834265364 - type: f1 value: 88.2026184757175 --- **新闻 | News** **[2024-04-22]** piccolo-large-zh-v2 目前在C-MTEB榜单取得第一名,领先上一名BERT模型约1.9个点。 piccolo-large-zh-v2 currently ranks first on the C-MTEB list, leading the previous BERT model by about 1.9 points. ## piccolo-large-zh-v2 piccolo-large-zh-v2 是一个通用embedding模型(中文), 由来自商汤科技的通用模型组完成训练,此次piccolo升级旨在更多地关注通用的下游finetune方式。我们将在近期更新我们的技术报告,同时详细技术细节也将在商汤4.23技术交流日披露: https://www.sensetime.com/cn piccolo-large-zh-v2 is a Chinese embedding model developed by the general model group at SenseTime Research. This upgraded version of Piccolo aims to prioritize general downstream fine-tuning methods. We plan to release an updated technical report in the near future, and further technical details will be disclosed during the SenseTime Tech Day on April 23rd: https://www.sensetime.com/cn ## Usage 目前该模型暂时需要通过API来进行访问: https://platform.sensenova.cn/doc?path=/chat/Embeddings/Embeddings.md Currently, the model needs to be accessed through API: https://platform.sensenova.cn/doc?path=/chat/Embeddings/Embeddings.md