--- library_name: sentence-transformers pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - sentence-embedding - mteb - mteb model-index: - name: e433e634850d125d8b85bee76db3a3b6a0c3bf56 results: - task: type: Clustering dataset: type: lyon-nlp/alloprof name: MTEB AlloProfClusteringP2P config: default split: test revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b metrics: - type: v_measure value: 56.88600728743999 - type: v_measures value: [0.5396081553520281, 0.6022872403200437, 0.5515205944691852, 0.5595772885785736, 0.5632413941951575] - task: type: Clustering dataset: type: lyon-nlp/alloprof name: MTEB AlloProfClusteringS2S config: default split: test revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b metrics: - type: v_measure value: 38.199527329051804 - type: v_measures value: [0.42157254138936706, 0.36882298663461527, 0.3134327610337458, 0.40391031391690396, 0.3832775043562133] - task: type: Reranking dataset: type: lyon-nlp/mteb-fr-reranking-alloprof-s2p name: MTEB AlloprofReranking config: default split: test revision: 65393d0d7a08a10b4e348135e824f385d420b0fd metrics: - type: map value: 68.73372257500206 - type: mrr value: 70.07434479260904 - type: nAUC_map_diff1 value: 50.95933484071007 - type: nAUC_map_max value: 13.75463910519138 - type: nAUC_mrr_diff1 value: 50.494303783447656 - type: nAUC_mrr_max value: 14.460935217916187 - task: type: Retrieval dataset: type: lyon-nlp/alloprof name: MTEB AlloprofRetrieval config: default split: test revision: fcf295ea64c750f41fadbaa37b9b861558e1bfbd metrics: - type: map_at_1 value: 21.675 - type: map_at_10 value: 32.274 - type: map_at_100 value: 33.316 - type: map_at_1000 value: 33.387 - type: map_at_20 value: 32.864 - type: map_at_3 value: 29.166999999999998 - type: map_at_5 value: 30.946 - type: mrr_at_1 value: 21.675302245250432 - type: mrr_at_10 value: 32.274309839076714 - type: mrr_at_100 value: 33.31571024590564 - type: mrr_at_1000 value: 33.3868130424392 - type: mrr_at_20 value: 32.863978562081925 - type: mrr_at_3 value: 29.16666666666669 - type: mrr_at_5 value: 30.94559585492234 - type: nauc_map_at_1000_diff1 value: 34.85808309940442 - type: nauc_map_at_1000_max value: 31.058801579682825 - type: nauc_map_at_100_diff1 value: 34.842898344470846 - type: nauc_map_at_100_max value: 31.077561464904342 - type: nauc_map_at_10_diff1 value: 34.6773118480208 - type: nauc_map_at_10_max value: 30.8489850780642 - type: nauc_map_at_1_diff1 value: 40.65773695743684 - type: nauc_map_at_1_max value: 28.766036921254617 - type: nauc_map_at_20_diff1 value: 34.73935242577166 - type: nauc_map_at_20_max value: 31.03143938077287 - type: nauc_map_at_3_diff1 value: 35.12059625476991 - type: nauc_map_at_3_max value: 30.48787855768291 - type: nauc_map_at_5_diff1 value: 34.73453235094986 - type: nauc_map_at_5_max value: 30.3860304682398 - type: nauc_mrr_at_1000_diff1 value: 34.85808309940442 - type: nauc_mrr_at_1000_max value: 31.058801579682825 - type: nauc_mrr_at_100_diff1 value: 34.842898344470846 - type: nauc_mrr_at_100_max value: 31.077561464904342 - type: nauc_mrr_at_10_diff1 value: 34.6773118480208 - type: nauc_mrr_at_10_max value: 30.8489850780642 - type: nauc_mrr_at_1_diff1 value: 40.65773695743684 - type: nauc_mrr_at_1_max value: 28.766036921254617 - type: nauc_mrr_at_20_diff1 value: 34.73935242577166 - type: nauc_mrr_at_20_max value: 31.03143938077287 - type: nauc_mrr_at_3_diff1 value: 35.12059625476991 - type: nauc_mrr_at_3_max value: 30.48787855768291 - type: nauc_mrr_at_5_diff1 value: 34.73453235094986 - type: nauc_mrr_at_5_max value: 30.3860304682398 - type: nauc_ndcg_at_1000_diff1 value: 34.04342467121623 - type: nauc_ndcg_at_1000_max value: 32.311398352704686 - type: nauc_ndcg_at_100_diff1 value: 33.67278941726764 - type: nauc_ndcg_at_100_max value: 33.0229606203184 - type: nauc_ndcg_at_10_diff1 value: 32.93808280492078 - type: nauc_ndcg_at_10_max value: 32.07111775221638 - type: nauc_ndcg_at_1_diff1 value: 40.65773695743684 - type: nauc_ndcg_at_1_max value: 28.766036921254617 - type: nauc_ndcg_at_20_diff1 value: 33.141323431064585 - type: nauc_ndcg_at_20_max value: 32.76436962238286 - type: nauc_ndcg_at_3_diff1 value: 33.77769745974645 - type: nauc_ndcg_at_3_max value: 31.072988073016912 - type: nauc_ndcg_at_5_diff1 value: 33.091582792245696 - type: nauc_ndcg_at_5_max value: 30.92378976230745 - type: nauc_precision_at_1000_diff1 value: 33.74743287990321 - type: nauc_precision_at_1000_max value: 60.08005213097628 - type: nauc_precision_at_100_diff1 value: 28.869275501873236 - type: nauc_precision_at_100_max value: 46.35483380447927 - type: nauc_precision_at_10_diff1 value: 27.910043146581497 - type: nauc_precision_at_10_max value: 36.07399824307888 - type: nauc_precision_at_1_diff1 value: 40.65773695743684 - type: nauc_precision_at_1_max value: 28.766036921254617 - type: nauc_precision_at_20_diff1 value: 28.144265629196163 - type: nauc_precision_at_20_max value: 39.60361579056115 - type: nauc_precision_at_3_diff1 value: 30.31893725671278 - type: nauc_precision_at_3_max value: 32.63695126407254 - type: nauc_precision_at_5_diff1 value: 28.699678130380235 - type: nauc_precision_at_5_max value: 32.37908851919098 - type: nauc_recall_at_1000_diff1 value: 33.74743287990342 - type: nauc_recall_at_1000_max value: 60.080052130975346 - type: nauc_recall_at_100_diff1 value: 28.869275501873247 - type: nauc_recall_at_100_max value: 46.35483380447917 - type: nauc_recall_at_10_diff1 value: 27.910043146581508 - type: nauc_recall_at_10_max value: 36.07399824307888 - type: nauc_recall_at_1_diff1 value: 40.65773695743684 - type: nauc_recall_at_1_max value: 28.766036921254617 - type: nauc_recall_at_20_diff1 value: 28.14426562919617 - type: nauc_recall_at_20_max value: 39.60361579056118 - type: nauc_recall_at_3_diff1 value: 30.318937256712804 - type: nauc_recall_at_3_max value: 32.63695126407256 - type: nauc_recall_at_5_diff1 value: 28.699678130380224 - type: nauc_recall_at_5_max value: 32.37908851919102 - type: ndcg_at_1 value: 21.675 - type: ndcg_at_10 value: 38.06 - type: ndcg_at_100 value: 43.491 - type: ndcg_at_1000 value: 45.432 - type: ndcg_at_20 value: 40.217000000000006 - type: ndcg_at_3 value: 31.642 - type: ndcg_at_5 value: 34.837 - type: precision_at_1 value: 21.675 - type: precision_at_10 value: 5.652 - type: precision_at_100 value: 0.827 - type: precision_at_1000 value: 0.098 - type: precision_at_20 value: 3.253 - type: precision_at_3 value: 12.939 - type: precision_at_5 value: 9.309000000000001 - type: recall_at_1 value: 21.675 - type: recall_at_10 value: 56.52 - type: recall_at_100 value: 82.729 - type: recall_at_1000 value: 98.1 - type: recall_at_20 value: 65.069 - type: recall_at_3 value: 38.817 - type: recall_at_5 value: 46.546 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (fr) config: fr split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 43.51 - type: f1 value: 41.3284674671926 - type: f1_weighted value: 41.3284674671926 - task: type: Retrieval dataset: type: maastrichtlawtech/bsard name: MTEB BSARDRetrieval config: default split: test revision: 5effa1b9b5fa3b0f9e12523e6e43e5f86a6e6d59 metrics: - type: map_at_1 value: 5.405 - type: map_at_10 value: 9.008 - type: map_at_100 value: 9.932 - type: map_at_1000 value: 10.042 - type: map_at_20 value: 9.389 - type: map_at_3 value: 7.883 - type: map_at_5 value: 8.626000000000001 - type: mrr_at_1 value: 5.405405405405405 - type: mrr_at_10 value: 9.007579007579007 - type: mrr_at_100 value: 9.931517094611667 - type: mrr_at_1000 value: 10.0416462267215 - type: mrr_at_20 value: 9.38869595990339 - type: mrr_at_3 value: 7.882882882882883 - type: mrr_at_5 value: 8.626126126126126 - type: nauc_map_at_1000_diff1 value: 23.53549434486455 - type: nauc_map_at_1000_max value: 9.977010641647402 - type: nauc_map_at_100_diff1 value: 23.50007884241435 - type: nauc_map_at_100_max value: 9.984274734441085 - type: nauc_map_at_10_diff1 value: 24.69444512826233 - type: nauc_map_at_10_max value: 9.726162724771594 - type: nauc_map_at_1_diff1 value: 40.88188899137848 - type: nauc_map_at_1_max value: 12.044739470755896 - type: nauc_map_at_20_diff1 value: 23.833757177107557 - type: nauc_map_at_20_max value: 9.94328216894336 - type: nauc_map_at_3_diff1 value: 28.320570164876653 - type: nauc_map_at_3_max value: 11.195397944839767 - type: nauc_map_at_5_diff1 value: 25.86894200735248 - type: nauc_map_at_5_max value: 8.43950569758736 - type: nauc_mrr_at_1000_diff1 value: 23.53549434486455 - type: nauc_mrr_at_1000_max value: 9.977010641647402 - type: nauc_mrr_at_100_diff1 value: 23.50007884241435 - type: nauc_mrr_at_100_max value: 9.984274734441085 - type: nauc_mrr_at_10_diff1 value: 24.69444512826233 - type: nauc_mrr_at_10_max value: 9.726162724771594 - type: nauc_mrr_at_1_diff1 value: 40.88188899137848 - type: nauc_mrr_at_1_max value: 12.044739470755896 - type: nauc_mrr_at_20_diff1 value: 23.833757177107557 - type: nauc_mrr_at_20_max value: 9.94328216894336 - type: nauc_mrr_at_3_diff1 value: 28.320570164876653 - type: nauc_mrr_at_3_max value: 11.195397944839767 - type: nauc_mrr_at_5_diff1 value: 25.86894200735248 - type: nauc_mrr_at_5_max value: 8.43950569758736 - type: nauc_ndcg_at_1000_diff1 value: 15.939402272339343 - type: nauc_ndcg_at_1000_max value: 10.076089125537772 - type: nauc_ndcg_at_100_diff1 value: 16.12740122067642 - type: nauc_ndcg_at_100_max value: 10.39935154464689 - type: nauc_ndcg_at_10_diff1 value: 20.455941061369295 - type: nauc_ndcg_at_10_max value: 9.350349883274461 - type: nauc_ndcg_at_1_diff1 value: 40.88188899137848 - type: nauc_ndcg_at_1_max value: 12.044739470755896 - type: nauc_ndcg_at_20_diff1 value: 18.267195122936364 - type: nauc_ndcg_at_20_max value: 10.211299135510837 - type: nauc_ndcg_at_3_diff1 value: 26.453038443158267 - type: nauc_ndcg_at_3_max value: 10.628723618231271 - type: nauc_ndcg_at_5_diff1 value: 22.815939702854084 - type: nauc_ndcg_at_5_max value: 6.308794763068443 - type: nauc_precision_at_1000_diff1 value: -7.915540524594587 - type: nauc_precision_at_1000_max value: 10.441250503021037 - type: nauc_precision_at_100_diff1 value: 2.7415108070462253 - type: nauc_precision_at_100_max value: 11.957692005514204 - type: nauc_precision_at_10_diff1 value: 12.731449206012213 - type: nauc_precision_at_10_max value: 9.218464561250887 - type: nauc_precision_at_1_diff1 value: 40.88188899137848 - type: nauc_precision_at_1_max value: 12.044739470755896 - type: nauc_precision_at_20_diff1 value: 8.658189595700664 - type: nauc_precision_at_20_max value: 11.571072137198621 - type: nauc_precision_at_3_diff1 value: 22.7637681983756 - type: nauc_precision_at_3_max value: 9.361635703809425 - type: nauc_precision_at_5_diff1 value: 17.02002973192349 - type: nauc_precision_at_5_max value: 1.8844406919262011 - type: nauc_recall_at_1000_diff1 value: -7.915540524594531 - type: nauc_recall_at_1000_max value: 10.441250503021028 - type: nauc_recall_at_100_diff1 value: 2.741510807046166 - type: nauc_recall_at_100_max value: 11.957692005514156 - type: nauc_recall_at_10_diff1 value: 12.731449206012224 - type: nauc_recall_at_10_max value: 9.218464561250883 - type: nauc_recall_at_1_diff1 value: 40.88188899137848 - type: nauc_recall_at_1_max value: 12.044739470755896 - type: nauc_recall_at_20_diff1 value: 8.65818959570063 - type: nauc_recall_at_20_max value: 11.571072137198572 - type: nauc_recall_at_3_diff1 value: 22.763768198375587 - type: nauc_recall_at_3_max value: 9.361635703809409 - type: nauc_recall_at_5_diff1 value: 17.02002973192351 - type: nauc_recall_at_5_max value: 1.8844406919262173 - type: ndcg_at_1 value: 5.405 - type: ndcg_at_10 value: 11.045 - type: ndcg_at_100 value: 16.724 - type: ndcg_at_1000 value: 20.325 - type: ndcg_at_20 value: 12.42 - type: ndcg_at_3 value: 8.746 - type: ndcg_at_5 value: 10.065 - type: precision_at_1 value: 5.405 - type: precision_at_10 value: 1.757 - type: precision_at_100 value: 0.468 - type: precision_at_1000 value: 0.077 - type: precision_at_20 value: 1.149 - type: precision_at_3 value: 3.7539999999999996 - type: precision_at_5 value: 2.883 - type: recall_at_1 value: 5.405 - type: recall_at_10 value: 17.568 - type: recall_at_100 value: 46.847 - type: recall_at_1000 value: 76.577 - type: recall_at_20 value: 22.973 - type: recall_at_3 value: 11.261000000000001 - type: recall_at_5 value: 14.414 - task: type: Clustering dataset: type: lyon-nlp/clustering-hal-s2s name: MTEB HALClusteringS2S config: default split: test revision: e06ebbbb123f8144bef1a5d18796f3dec9ae2915 metrics: - type: v_measure value: 24.495384349905265 - type: v_measures value: [0.2850587858600384, 0.274086904447773, 0.2446866774990972, 0.26946100959565517, 0.24156528297396174] - task: type: Clustering dataset: type: reciTAL/mlsum name: MTEB MLSUMClusteringP2P config: default split: test revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 metrics: - type: v_measure value: 41.7878688793447 - type: v_measures value: [0.4201324393825989, 0.4205306567437461, 0.4221300501395374, 0.4210735177933313, 0.38124298228695813] - task: type: Clustering dataset: type: reciTAL/mlsum name: MTEB MLSUMClusteringS2S config: default split: test revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 metrics: - type: v_measure value: 41.54533473611554 - type: v_measures value: [0.3978917671338969, 0.42610299599987944, 0.4152131658150196, 0.40558711021249855, 0.38327501252308305] - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (fr) config: fr split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 85.33041027247104 - type: f1 value: 85.4043088703478 - type: f1_weighted value: 85.22086763441686 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (fr) config: fr split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 59.01346695897275 - type: f1 value: 41.296845063208316 - type: f1_weighted value: 61.793813202867696 - task: type: Classification dataset: type: mteb/masakhanews name: MTEB MasakhaNEWSClassification (fra) config: fra split: test revision: 18193f187b92da67168c655c9973a165ed9593dd metrics: - type: accuracy value: 72.60663507109004 - type: f1 value: 68.67522100429781 - type: f1_weighted value: 72.75616093668002 - task: type: Clustering dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClusteringP2P (fra) config: fra split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: v_measure value: 49.17691007381563 - type: v_measures value: [1.0, 0.033833191750480725, 0.5707463198244268, 0.1318223737892885, 0.7224436183265853] - task: type: Clustering dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClusteringS2S (fra) config: fra split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: v_measure value: 26.9350763881635 - type: v_measures value: [1.0, 0.0002883507347309009, 0.18259625098776155, 0.025306110065234755, 0.1385631076204479] - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fr) config: fr split: test revision: 4672e20407010da34463acc759c162ca9734bca6 metrics: - type: accuracy value: 65.1546738399462 - type: f1 value: 62.81367149102006 - type: f1_weighted value: 64.45478181518959 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fr) config: fr split: test revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 metrics: - type: accuracy value: 69.94283792871553 - type: f1 value: 69.3387310036327 - type: f1_weighted value: 69.77979200675047 - task: type: Retrieval dataset: type: jinaai/mintakaqa name: MTEB MintakaRetrieval (fr) config: fr split: test revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e metrics: - type: map_at_1 value: 14.536999999999999 - type: map_at_10 value: 22.972 - type: map_at_100 value: 24.046 - type: map_at_1000 value: 24.15 - type: map_at_20 value: 23.56 - type: map_at_3 value: 20.639 - type: map_at_5 value: 21.886 - type: mrr_at_1 value: 14.537264537264537 - type: mrr_at_10 value: 22.97172172172171 - type: mrr_at_100 value: 24.04581030084757 - type: mrr_at_1000 value: 24.15012351833827 - type: mrr_at_20 value: 23.559920001131612 - type: mrr_at_3 value: 20.63882063882061 - type: mrr_at_5 value: 21.88574938574935 - type: nauc_map_at_1000_diff1 value: 25.172495501911456 - type: nauc_map_at_1000_max value: 39.07442097828252 - type: nauc_map_at_100_diff1 value: 25.129142743145884 - type: nauc_map_at_100_max value: 39.03725272182565 - type: nauc_map_at_10_diff1 value: 25.52237435145409 - type: nauc_map_at_10_max value: 39.5761256079619 - type: nauc_map_at_1_diff1 value: 31.68506359690787 - type: nauc_map_at_1_max value: 39.251552013635425 - type: nauc_map_at_20_diff1 value: 25.223544981725286 - type: nauc_map_at_20_max value: 39.20307777977743 - type: nauc_map_at_3_diff1 value: 26.5913043939904 - type: nauc_map_at_3_max value: 40.38909639557377 - type: nauc_map_at_5_diff1 value: 25.90291761511258 - type: nauc_map_at_5_max value: 40.08746876057708 - type: nauc_mrr_at_1000_diff1 value: 25.172495501911456 - type: nauc_mrr_at_1000_max value: 39.07442097828252 - type: nauc_mrr_at_100_diff1 value: 25.129142743145884 - type: nauc_mrr_at_100_max value: 39.03725272182565 - type: nauc_mrr_at_10_diff1 value: 25.52237435145409 - type: nauc_mrr_at_10_max value: 39.5761256079619 - type: nauc_mrr_at_1_diff1 value: 31.68506359690787 - type: nauc_mrr_at_1_max value: 39.251552013635425 - type: nauc_mrr_at_20_diff1 value: 25.223544981725286 - type: nauc_mrr_at_20_max value: 39.20307777977743 - type: nauc_mrr_at_3_diff1 value: 26.5913043939904 - type: nauc_mrr_at_3_max value: 40.38909639557377 - type: nauc_mrr_at_5_diff1 value: 25.90291761511258 - type: nauc_mrr_at_5_max value: 40.08746876057708 - type: nauc_ndcg_at_1000_diff1 value: 23.22275566961323 - type: nauc_ndcg_at_1000_max value: 37.77760760027764 - type: nauc_ndcg_at_100_diff1 value: 21.715763741257927 - type: nauc_ndcg_at_100_max value: 36.46541121995108 - type: nauc_ndcg_at_10_diff1 value: 23.278761630662373 - type: nauc_ndcg_at_10_max value: 38.7930407055593 - type: nauc_ndcg_at_1_diff1 value: 31.68506359690787 - type: nauc_ndcg_at_1_max value: 39.251552013635425 - type: nauc_ndcg_at_20_diff1 value: 22.247483519405314 - type: nauc_ndcg_at_20_max value: 37.52699283756433 - type: nauc_ndcg_at_3_diff1 value: 25.285332146360567 - type: nauc_ndcg_at_3_max value: 40.49755286945492 - type: nauc_ndcg_at_5_diff1 value: 24.188132420084607 - type: nauc_ndcg_at_5_max value: 40.023420096094924 - type: nauc_precision_at_1000_diff1 value: 22.011383616462943 - type: nauc_precision_at_1000_max value: 33.1171975223399 - type: nauc_precision_at_100_diff1 value: 8.869925191243802 - type: nauc_precision_at_100_max value: 24.642097404720463 - type: nauc_precision_at_10_diff1 value: 17.74075352930919 - type: nauc_precision_at_10_max value: 36.488352516736775 - type: nauc_precision_at_1_diff1 value: 31.68506359690787 - type: nauc_precision_at_1_max value: 39.251552013635425 - type: nauc_precision_at_20_diff1 value: 14.092673370526898 - type: nauc_precision_at_20_max value: 32.16083119966346 - type: nauc_precision_at_3_diff1 value: 22.16344389106631 - type: nauc_precision_at_3_max value: 40.70883095791623 - type: nauc_precision_at_5_diff1 value: 20.119543069972256 - type: nauc_precision_at_5_max value: 39.79763147435235 - type: nauc_recall_at_1000_diff1 value: 22.011383616462528 - type: nauc_recall_at_1000_max value: 33.117197522340085 - type: nauc_recall_at_100_diff1 value: 8.869925191243775 - type: nauc_recall_at_100_max value: 24.64209740472041 - type: nauc_recall_at_10_diff1 value: 17.740753529309178 - type: nauc_recall_at_10_max value: 36.48835251673679 - type: nauc_recall_at_1_diff1 value: 31.68506359690787 - type: nauc_recall_at_1_max value: 39.251552013635425 - type: nauc_recall_at_20_diff1 value: 14.092673370526915 - type: nauc_recall_at_20_max value: 32.160831199663455 - type: nauc_recall_at_3_diff1 value: 22.163443891066322 - type: nauc_recall_at_3_max value: 40.708830957916234 - type: nauc_recall_at_5_diff1 value: 20.119543069972217 - type: nauc_recall_at_5_max value: 39.79763147435234 - type: ndcg_at_1 value: 14.536999999999999 - type: ndcg_at_10 value: 27.485 - type: ndcg_at_100 value: 33.206 - type: ndcg_at_1000 value: 36.382999999999996 - type: ndcg_at_20 value: 29.635 - type: ndcg_at_3 value: 22.597 - type: ndcg_at_5 value: 24.851 - type: precision_at_1 value: 14.536999999999999 - type: precision_at_10 value: 4.189 - type: precision_at_100 value: 0.698 - type: precision_at_1000 value: 0.096 - type: precision_at_20 value: 2.52 - type: precision_at_3 value: 9.419 - type: precision_at_5 value: 6.749 - type: recall_at_1 value: 14.536999999999999 - type: recall_at_10 value: 41.892 - type: recall_at_100 value: 69.779 - type: recall_at_1000 value: 95.61800000000001 - type: recall_at_20 value: 50.41 - type: recall_at_3 value: 28.255999999999997 - type: recall_at_5 value: 33.743 - task: type: PairClassification dataset: type: GEM/opusparcus name: MTEB OpusparcusPC (fr) config: fr split: test revision: 9e9b1f8ef51616073f47f306f7f47dd91663f86a metrics: - type: cos_sim_accuracy value: 81.74386920980926 - type: cos_sim_ap value: 93.18281680904117 - type: cos_sim_f1 value: 87.37233054781802 - type: cos_sim_precision value: 82.04010462074979 - type: cos_sim_recall value: 93.44587884806356 - type: dot_accuracy value: 81.74386920980926 - type: dot_ap value: 93.18281680904117 - type: dot_f1 value: 87.37233054781802 - type: dot_precision value: 82.04010462074979 - type: dot_recall value: 93.44587884806356 - type: euclidean_accuracy value: 81.74386920980926 - type: euclidean_ap value: 93.18281680904117 - type: euclidean_f1 value: 87.37233054781802 - type: euclidean_precision value: 82.04010462074979 - type: euclidean_recall value: 93.44587884806356 - type: manhattan_accuracy value: 81.74386920980926 - type: manhattan_ap value: 93.17517480971131 - type: manhattan_f1 value: 87.37864077669903 - type: manhattan_precision value: 81.74740484429066 - type: manhattan_recall value: 93.84309831181727 - type: max_accuracy value: 81.74386920980926 - type: max_ap value: 93.18281680904117 - type: max_f1 value: 87.37864077669903 - task: type: PairClassification dataset: type: google-research-datasets/paws-x name: MTEB PawsX (fr) config: fr split: test revision: 8a04d940a42cd40658986fdd8e3da561533a3646 metrics: - type: cos_sim_accuracy value: 61.1 - type: cos_sim_ap value: 60.75603519868964 - type: cos_sim_f1 value: 62.78646780647509 - type: cos_sim_precision value: 46.74972914409534 - type: cos_sim_recall value: 95.5703211517165 - type: dot_accuracy value: 61.1 - type: dot_ap value: 60.74807680023078 - type: dot_f1 value: 62.78646780647509 - type: dot_precision value: 46.74972914409534 - type: dot_recall value: 95.5703211517165 - type: euclidean_accuracy value: 61.1 - type: euclidean_ap value: 60.756144387817734 - type: euclidean_f1 value: 62.78646780647509 - type: euclidean_precision value: 46.74972914409534 - type: euclidean_recall value: 95.5703211517165 - type: manhattan_accuracy value: 61.150000000000006 - type: manhattan_ap value: 60.685188544775116 - type: manhattan_f1 value: 62.7721335268505 - type: manhattan_precision value: 46.6810577441986 - type: manhattan_recall value: 95.79180509413068 - type: max_accuracy value: 61.150000000000006 - type: max_ap value: 60.756144387817734 - type: max_f1 value: 62.78646780647509 - task: type: STS dataset: type: Lajavaness/SICK-fr name: MTEB SICKFr config: default split: test revision: e077ab4cf4774a1e36d86d593b150422fafd8e8a metrics: - type: cos_sim_pearson value: 83.1543597030015 - type: cos_sim_spearman value: 77.10092303546944 - type: euclidean_pearson value: 80.27115846915481 - type: euclidean_spearman value: 77.10092516058822 - type: manhattan_pearson value: 80.30090425968062 - type: manhattan_spearman value: 77.09423647945061 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (fr) config: fr split: test revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 metrics: - type: cos_sim_pearson value: 79.20797144286122 - type: cos_sim_spearman value: 80.31452099282514 - type: euclidean_pearson value: 78.43621396282957 - type: euclidean_spearman value: 80.31452099282514 - type: manhattan_pearson value: 78.29678738374866 - type: manhattan_spearman value: 79.93185465249057 - task: type: STS dataset: type: PhilipMay/stsb_multi_mt name: MTEB STSBenchmarkMultilingualSTS (fr) config: fr split: test revision: 29afa2569dcedaaa2fe6a3dcfebab33d28b82e8c metrics: - type: cos_sim_pearson value: 84.69215133897265 - type: cos_sim_spearman value: 84.35617480959016 - type: euclidean_pearson value: 83.85371663492563 - type: euclidean_spearman value: 84.35617480959016 - type: manhattan_pearson value: 83.85857789722276 - type: manhattan_spearman value: 84.30794186513978 - task: type: Summarization dataset: type: lyon-nlp/summarization-summeval-fr-p2p name: MTEB SummEvalFr config: default split: test revision: b385812de6a9577b6f4d0f88c6a6e35395a94054 metrics: - type: cos_sim_pearson value: 29.187176809104393 - type: cos_sim_spearman value: 29.65160679657583 - type: dot_pearson value: 29.18717349611766 - type: dot_spearman value: 29.65160679657583 - task: type: Reranking dataset: type: lyon-nlp/mteb-fr-reranking-syntec-s2p name: MTEB SyntecReranking config: default split: test revision: daf0863838cd9e3ba50544cdce3ac2b338a1b0ad metrics: - type: map value: 82.76666666666667 - type: mrr value: 82.76666666666667 - type: nAUC_map_diff1 value: 52.548913230162405 - type: nAUC_map_max value: -2.824065935620183 - type: nAUC_mrr_diff1 value: 52.548913230162405 - type: nAUC_mrr_max value: -2.824065935620183 - task: type: Retrieval dataset: type: lyon-nlp/mteb-fr-retrieval-syntec-s2p name: MTEB SyntecRetrieval config: default split: test revision: 19661ccdca4dfc2d15122d776b61685f48c68ca9 metrics: - type: map_at_1 value: 57.99999999999999 - type: map_at_10 value: 72.356 - type: map_at_100 value: 72.625 - type: map_at_1000 value: 72.625 - type: map_at_20 value: 72.625 - type: map_at_3 value: 70.333 - type: map_at_5 value: 71.48299999999999 - type: mrr_at_1 value: 57.99999999999999 - type: mrr_at_10 value: 72.35634920634922 - type: mrr_at_100 value: 72.62532693914275 - type: mrr_at_1000 value: 72.62532693914275 - type: mrr_at_20 value: 72.62532693914275 - type: mrr_at_3 value: 70.33333333333333 - type: mrr_at_5 value: 71.48333333333333 - type: nauc_map_at_1000_diff1 value: 57.27081552588017 - type: nauc_map_at_1000_max value: 13.401922890723771 - type: nauc_map_at_100_diff1 value: 57.27081552588017 - type: nauc_map_at_100_max value: 13.401922890723771 - type: nauc_map_at_10_diff1 value: 57.39952453922188 - type: nauc_map_at_10_max value: 14.093164837730344 - type: nauc_map_at_1_diff1 value: 57.23800679107291 - type: nauc_map_at_1_max value: 11.039846765533865 - type: nauc_map_at_20_diff1 value: 57.27081552588017 - type: nauc_map_at_20_max value: 13.401922890723771 - type: nauc_map_at_3_diff1 value: 58.14875247321224 - type: nauc_map_at_3_max value: 14.538312305676238 - type: nauc_map_at_5_diff1 value: 57.34940275695991 - type: nauc_map_at_5_max value: 13.675180459395065 - type: nauc_mrr_at_1000_diff1 value: 57.27081552588017 - type: nauc_mrr_at_1000_max value: 13.401922890723771 - type: nauc_mrr_at_100_diff1 value: 57.27081552588017 - type: nauc_mrr_at_100_max value: 13.401922890723771 - type: nauc_mrr_at_10_diff1 value: 57.39952453922188 - type: nauc_mrr_at_10_max value: 14.093164837730344 - type: nauc_mrr_at_1_diff1 value: 57.23800679107291 - type: nauc_mrr_at_1_max value: 11.039846765533865 - type: nauc_mrr_at_20_diff1 value: 57.27081552588017 - type: nauc_mrr_at_20_max value: 13.401922890723771 - type: nauc_mrr_at_3_diff1 value: 58.14875247321224 - type: nauc_mrr_at_3_max value: 14.538312305676238 - type: nauc_mrr_at_5_diff1 value: 57.34940275695991 - type: nauc_mrr_at_5_max value: 13.675180459395065 - type: nauc_ndcg_at_1000_diff1 value: 57.38511684819052 - type: nauc_ndcg_at_1000_max value: 13.993185568467656 - type: nauc_ndcg_at_100_diff1 value: 57.38511684819052 - type: nauc_ndcg_at_100_max value: 13.993185568467656 - type: nauc_ndcg_at_10_diff1 value: 57.93396526410134 - type: nauc_ndcg_at_10_max value: 17.16319020800824 - type: nauc_ndcg_at_1_diff1 value: 57.23800679107291 - type: nauc_ndcg_at_1_max value: 11.039846765533865 - type: nauc_ndcg_at_20_diff1 value: 57.38511684819052 - type: nauc_ndcg_at_20_max value: 13.993185568467656 - type: nauc_ndcg_at_3_diff1 value: 59.36410104940948 - type: nauc_ndcg_at_3_max value: 17.128826753860732 - type: nauc_ndcg_at_5_diff1 value: 57.71094150714742 - type: nauc_ndcg_at_5_max value: 15.62784584334318 - type: nauc_precision_at_1000_diff1 value: nan - type: nauc_precision_at_1000_max value: nan - type: nauc_precision_at_100_diff1 value: nan - type: nauc_precision_at_100_max value: nan - type: nauc_precision_at_10_diff1 value: 66.79505135387465 - type: nauc_precision_at_10_max value: 70.47152194211033 - type: nauc_precision_at_1_diff1 value: 57.23800679107291 - type: nauc_precision_at_1_max value: 11.039846765533865 - type: nauc_precision_at_20_diff1 value: 100.0 - type: nauc_precision_at_20_max value: 100.0 - type: nauc_precision_at_3_diff1 value: 65.65896518060521 - type: nauc_precision_at_3_max value: 30.198503091441538 - type: nauc_precision_at_5_diff1 value: 60.04201680672288 - type: nauc_precision_at_5_max value: 29.000933706816145 - type: nauc_recall_at_1000_diff1 value: nan - type: nauc_recall_at_1000_max value: nan - type: nauc_recall_at_100_diff1 value: nan - type: nauc_recall_at_100_max value: nan - type: nauc_recall_at_10_diff1 value: 66.7950513538749 - type: nauc_recall_at_10_max value: 70.47152194211012 - type: nauc_recall_at_1_diff1 value: 57.23800679107291 - type: nauc_recall_at_1_max value: 11.039846765533865 - type: nauc_recall_at_20_diff1 value: nan - type: nauc_recall_at_20_max value: nan - type: nauc_recall_at_3_diff1 value: 65.65896518060525 - type: nauc_recall_at_3_max value: 30.19850309144154 - type: nauc_recall_at_5_diff1 value: 60.0420168067226 - type: nauc_recall_at_5_max value: 29.000933706816 - type: ndcg_at_1 value: 57.99999999999999 - type: ndcg_at_10 value: 78.19800000000001 - type: ndcg_at_100 value: 79.199 - type: ndcg_at_1000 value: 79.199 - type: ndcg_at_20 value: 79.199 - type: ndcg_at_3 value: 74.119 - type: ndcg_at_5 value: 76.184 - type: precision_at_1 value: 57.99999999999999 - type: precision_at_10 value: 9.6 - type: precision_at_100 value: 1.0 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 5.0 - type: precision_at_3 value: 28.333000000000002 - type: precision_at_5 value: 18.0 - type: recall_at_1 value: 57.99999999999999 - type: recall_at_10 value: 96.0 - type: recall_at_100 value: 100.0 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 100.0 - type: recall_at_3 value: 85.0 - type: recall_at_5 value: 90.0 - task: type: Retrieval dataset: type: jinaai/xpqa name: MTEB XPQARetrieval (fr) config: fr split: test revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f metrics: - type: map_at_1 value: 35.256 - type: map_at_10 value: 54.071999999999996 - type: map_at_100 value: 55.435 - type: map_at_1000 value: 55.53 - type: map_at_20 value: 54.855 - type: map_at_3 value: 48.762 - type: map_at_5 value: 51.949999999999996 - type: mrr_at_1 value: 56.34178905206942 - type: mrr_at_10 value: 63.30843240723078 - type: mrr_at_100 value: 63.92076387626982 - type: mrr_at_1000 value: 63.9435076251571 - type: mrr_at_20 value: 63.64110365119446 - type: mrr_at_3 value: 61.526479750778805 - type: mrr_at_5 value: 62.38762794837559 - type: nauc_map_at_1000_diff1 value: 45.88957885553053 - type: nauc_map_at_1000_max value: 52.59013482565773 - type: nauc_map_at_100_diff1 value: 45.84948517422948 - type: nauc_map_at_100_max value: 52.55839985303019 - type: nauc_map_at_10_diff1 value: 45.763486819482196 - type: nauc_map_at_10_max value: 52.09054118600712 - type: nauc_map_at_1_diff1 value: 55.521911317670835 - type: nauc_map_at_1_max value: 34.68779817675579 - type: nauc_map_at_20_diff1 value: 45.757369615751884 - type: nauc_map_at_20_max value: 52.44708031434436 - type: nauc_map_at_3_diff1 value: 47.798733616712056 - type: nauc_map_at_3_max value: 46.87976781177451 - type: nauc_map_at_5_diff1 value: 46.215964363315884 - type: nauc_map_at_5_max value: 50.5765276342371 - type: nauc_mrr_at_1000_diff1 value: 55.110400510640766 - type: nauc_mrr_at_1000_max value: 62.66171179919574 - type: nauc_mrr_at_100_diff1 value: 55.10166012000449 - type: nauc_mrr_at_100_max value: 62.66269343813773 - type: nauc_mrr_at_10_diff1 value: 55.087629594751256 - type: nauc_mrr_at_10_max value: 62.69978067726044 - type: nauc_mrr_at_1_diff1 value: 57.446957773325956 - type: nauc_mrr_at_1_max value: 63.22109004948565 - type: nauc_mrr_at_20_diff1 value: 55.067208283222016 - type: nauc_mrr_at_20_max value: 62.66935664582939 - type: nauc_mrr_at_3_diff1 value: 55.18870023658262 - type: nauc_mrr_at_3_max value: 62.597473549957996 - type: nauc_mrr_at_5_diff1 value: 54.87651100155316 - type: nauc_mrr_at_5_max value: 62.72845534030979 - type: nauc_ndcg_at_1000_diff1 value: 47.81162759706491 - type: nauc_ndcg_at_1000_max value: 56.26337910947683 - type: nauc_ndcg_at_100_diff1 value: 47.119077388160676 - type: nauc_ndcg_at_100_max value: 55.82354642959063 - type: nauc_ndcg_at_10_diff1 value: 46.784535879466496 - type: nauc_ndcg_at_10_max value: 54.63437116703429 - type: nauc_ndcg_at_1_diff1 value: 57.446957773325956 - type: nauc_ndcg_at_1_max value: 63.22109004948565 - type: nauc_ndcg_at_20_diff1 value: 46.756211545478905 - type: nauc_ndcg_at_20_max value: 55.228917899613826 - type: nauc_ndcg_at_3_diff1 value: 47.66168453462149 - type: nauc_ndcg_at_3_max value: 54.39836405112981 - type: nauc_ndcg_at_5_diff1 value: 46.97491630908418 - type: nauc_ndcg_at_5_max value: 53.284362953526184 - type: nauc_precision_at_1000_diff1 value: -14.959536048875451 - type: nauc_precision_at_1000_max value: 19.740731727610537 - type: nauc_precision_at_100_diff1 value: -10.329364912432421 - type: nauc_precision_at_100_max value: 27.80165890502952 - type: nauc_precision_at_10_diff1 value: 0.7865296687777561 - type: nauc_precision_at_10_max value: 38.46291415400641 - type: nauc_precision_at_1_diff1 value: 57.446957773325956 - type: nauc_precision_at_1_max value: 63.22109004948565 - type: nauc_precision_at_20_diff1 value: -2.2696079664009385 - type: nauc_precision_at_20_max value: 35.38696590671127 - type: nauc_precision_at_3_diff1 value: 14.016444043719714 - type: nauc_precision_at_3_max value: 46.68119169258843 - type: nauc_precision_at_5_diff1 value: 6.466134759646741 - type: nauc_precision_at_5_max value: 43.245171983039256 - type: nauc_recall_at_1000_diff1 value: 10.588340380461794 - type: nauc_recall_at_1000_max value: 45.913607560926515 - type: nauc_recall_at_100_diff1 value: 28.995302681864565 - type: nauc_recall_at_100_max value: 42.67608149089844 - type: nauc_recall_at_10_diff1 value: 38.958724392572854 - type: nauc_recall_at_10_max value: 47.455666375173315 - type: nauc_recall_at_1_diff1 value: 55.521911317670835 - type: nauc_recall_at_1_max value: 34.68779817675579 - type: nauc_recall_at_20_diff1 value: 36.623788206732016 - type: nauc_recall_at_20_max value: 46.654888587980174 - type: nauc_recall_at_3_diff1 value: 43.46749373705754 - type: nauc_recall_at_3_max value: 42.55592784672105 - type: nauc_recall_at_5_diff1 value: 40.49018957054939 - type: nauc_recall_at_5_max value: 46.86884862874594 - type: ndcg_at_1 value: 56.342000000000006 - type: ndcg_at_10 value: 60.01800000000001 - type: ndcg_at_100 value: 65.182 - type: ndcg_at_1000 value: 66.809 - type: ndcg_at_20 value: 61.982000000000006 - type: ndcg_at_3 value: 55.688 - type: ndcg_at_5 value: 56.607 - type: precision_at_1 value: 56.342000000000006 - type: precision_at_10 value: 14.005 - type: precision_at_100 value: 1.821 - type: precision_at_1000 value: 0.20500000000000002 - type: precision_at_20 value: 7.684 - type: precision_at_3 value: 34.089999999999996 - type: precision_at_5 value: 24.005000000000003 - type: recall_at_1 value: 35.256 - type: recall_at_10 value: 67.583 - type: recall_at_100 value: 88.74300000000001 - type: recall_at_1000 value: 99.163 - type: recall_at_20 value: 73.87 - type: recall_at_3 value: 53.371 - type: recall_at_5 value: 59.399 license: apache-2.0 --- # [bilingual-embedding-base](https://huggingface.co/Lajavaness/bilingual-embedding-base) This repo is a fork of the original [Lajavaness/bilingual-embedding-base](https://huggingface.co/Lajavaness/bilingual-embedding-base). The only difference is the model type name, to be compatible with text-embeddings-inference. Bilingual-embedding is the Embedding Model for bilingual language: french and english. This model is a specialized sentence-embedding trained specifically for the bilingual language, leveraging the robust capabilities of [XLM-RoBERTa](https://huggingface.co/FacebookAI/xlm-roberta-base), a pre-trained language model based on the [XLM-RoBERTa](https://huggingface.co/FacebookAI/xlm-roberta-base) architecture. The model utilizes xlm-roberta to encode english-french sentences into a 1024-dimensional vector space, facilitating a wide range of applications from semantic search to text clustering. The embeddings capture the nuanced meanings of english-french sentences, reflecting both the lexical and contextual layers of the language. ## Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BilingualModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Training and Fine-tuning process #### Stage 1: NLI Training - Dataset: [(SNLI+XNLI) for english+french] - Method: Training using Multi-Negative Ranking Loss. This stage focused on improving the model's ability to discern and rank nuanced differences in sentence semantics. ### Stage 3: Continued Fine-tuning for Semantic Textual Similarity on STS Benchmark - Dataset: [STSB-fr and en] - Method: Fine-tuning specifically for the semantic textual similarity benchmark using Siamese BERT-Networks configured with the 'sentence-transformers' library. ### Stage 4: Advanced Augmentation Fine-tuning - Dataset: STSB with generate [silver sample from gold sample](https://www.sbert.net/examples/training/data_augmentation/README.html) - Method: Employed an advanced strategy using [Augmented SBERT](https://arxiv.org/abs/2010.08240) with Pair Sampling Strategies, integrating both Cross-Encoder and Bi-Encoder models. This stage further refined the embeddings by enriching the training data dynamically, enhancing the model's robustness and accuracy. ## Usage: Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed: ``` pip install -U sentence-transformers ``` Then you can use the model like this: ```python from sentence_transformers import SentenceTransformer sentences = ["Paris est une capitale de la France", "Paris is a capital of France"] model = SentenceTransformer('Lajavaness/bilingual-embedding-base', trust_remote_code=True) print(embeddings) ``` ## Evaluation TODO ## Citation @article{conneau2019unsupervised, title={Unsupervised cross-lingual representation learning at scale}, author={Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm{\'a}n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin}, journal={arXiv preprint arXiv:1911.02116}, year={2019} } @article{reimers2019sentence, title={Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks}, author={Nils Reimers, Iryna Gurevych}, journal={https://arxiv.org/abs/1908.10084}, year={2019} } @article{thakur2020augmented, title={Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks}, author={Thakur, Nandan and Reimers, Nils and Daxenberger, Johannes and Gurevych, Iryna}, journal={arXiv e-prints}, pages={arXiv--2010}, year={2020}