--- library_name: sentence-transformers pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - mteb model-index: - name: bge-m3-custom-fr results: - task: type: Clustering dataset: type: lyon-nlp/alloprof name: MTEB AlloProfClusteringP2P config: default split: test revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b metrics: - type: v_measure value: 56.727459716713 - task: type: Clustering dataset: type: lyon-nlp/alloprof name: MTEB AlloProfClusteringS2S config: default split: test revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b metrics: - type: v_measure value: 38.19920006179227 - task: type: Reranking dataset: type: lyon-nlp/mteb-fr-reranking-alloprof-s2p name: MTEB AlloprofReranking config: default split: test revision: e40c8a63ce02da43200eccb5b0846fcaa888f562 metrics: - type: map value: 65.17465797499942 - type: mrr value: 66.51400197384653 - task: type: Retrieval dataset: type: lyon-nlp/alloprof name: MTEB AlloprofRetrieval config: default split: test revision: 2df7bee4080bedf2e97de3da6bd5c7bc9fc9c4d2 metrics: - type: map_at_1 value: 29.836000000000002 - type: map_at_10 value: 39.916000000000004 - type: map_at_100 value: 40.816 - type: map_at_1000 value: 40.877 - type: map_at_3 value: 37.294 - type: map_at_5 value: 38.838 - type: mrr_at_1 value: 29.836000000000002 - type: mrr_at_10 value: 39.916000000000004 - type: mrr_at_100 value: 40.816 - type: mrr_at_1000 value: 40.877 - type: mrr_at_3 value: 37.294 - type: mrr_at_5 value: 38.838 - type: ndcg_at_1 value: 29.836000000000002 - type: ndcg_at_10 value: 45.097 - type: ndcg_at_100 value: 49.683 - type: ndcg_at_1000 value: 51.429 - type: ndcg_at_3 value: 39.717 - type: ndcg_at_5 value: 42.501 - type: precision_at_1 value: 29.836000000000002 - type: precision_at_10 value: 6.149 - type: precision_at_100 value: 0.8340000000000001 - type: precision_at_1000 value: 0.097 - type: precision_at_3 value: 15.576 - type: precision_at_5 value: 10.698 - type: recall_at_1 value: 29.836000000000002 - type: recall_at_10 value: 61.485 - type: recall_at_100 value: 83.428 - type: recall_at_1000 value: 97.461 - type: recall_at_3 value: 46.727000000000004 - type: recall_at_5 value: 53.489 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (fr) config: fr split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 42.332 - type: f1 value: 40.801800929404344 - task: type: Retrieval dataset: type: maastrichtlawtech/bsard name: MTEB BSARDRetrieval config: default split: test revision: 5effa1b9b5fa3b0f9e12523e6e43e5f86a6e6d59 metrics: - type: map_at_1 value: 0.0 - type: map_at_10 value: 0.0 - type: map_at_100 value: 0.011000000000000001 - type: map_at_1000 value: 0.018000000000000002 - type: map_at_3 value: 0.0 - type: map_at_5 value: 0.0 - type: mrr_at_1 value: 0.0 - type: mrr_at_10 value: 0.0 - type: mrr_at_100 value: 0.011000000000000001 - type: mrr_at_1000 value: 0.018000000000000002 - type: mrr_at_3 value: 0.0 - type: mrr_at_5 value: 0.0 - type: ndcg_at_1 value: 0.0 - type: ndcg_at_10 value: 0.0 - type: ndcg_at_100 value: 0.13999999999999999 - type: ndcg_at_1000 value: 0.457 - type: ndcg_at_3 value: 0.0 - type: ndcg_at_5 value: 0.0 - type: precision_at_1 value: 0.0 - type: precision_at_10 value: 0.0 - type: precision_at_100 value: 0.009000000000000001 - type: precision_at_1000 value: 0.004 - type: precision_at_3 value: 0.0 - type: precision_at_5 value: 0.0 - type: recall_at_1 value: 0.0 - type: recall_at_10 value: 0.0 - type: recall_at_100 value: 0.901 - type: recall_at_1000 value: 3.604 - type: recall_at_3 value: 0.0 - type: recall_at_5 value: 0.0 - task: type: Clustering dataset: type: lyon-nlp/clustering-hal-s2s name: MTEB HALClusteringS2S config: default split: test revision: e06ebbbb123f8144bef1a5d18796f3dec9ae2915 metrics: - type: v_measure value: 24.1294565929144 - task: type: Clustering dataset: type: mlsum name: MTEB MLSUMClusteringP2P config: default split: test revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 metrics: - type: v_measure value: 42.12040762356958 - task: type: Clustering dataset: type: mlsum name: MTEB MLSUMClusteringS2S config: default split: test revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 metrics: - type: v_measure value: 36.69102548662494 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (fr) config: fr split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 90.3946132164109 - type: f1 value: 90.15608090764273 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (fr) config: fr split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 60.87691825869088 - type: f1 value: 43.56160799721332 - task: type: Classification dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClassification (fra) config: fra split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: accuracy value: 70.52132701421802 - type: f1 value: 66.7911493789742 - task: type: Clustering dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClusteringP2P (fra) config: fra split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: v_measure value: 34.60975901092521 - task: type: Clustering dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClusteringS2S (fra) config: fra split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: v_measure value: 32.8092912406207 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fr) config: fr split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.70477471418964 - type: f1 value: 64.4848306188641 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fr) config: fr split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.57969065232011 - type: f1 value: 73.58251655418402 - task: type: Retrieval dataset: type: jinaai/mintakaqa name: MTEB MintakaRetrieval (fr) config: fr split: test revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e metrics: - type: map_at_1 value: 14.005 - type: map_at_10 value: 21.279999999999998 - type: map_at_100 value: 22.288 - type: map_at_1000 value: 22.404 - type: map_at_3 value: 19.151 - type: map_at_5 value: 20.322000000000003 - type: mrr_at_1 value: 14.005 - type: mrr_at_10 value: 21.279999999999998 - type: mrr_at_100 value: 22.288 - type: mrr_at_1000 value: 22.404 - type: mrr_at_3 value: 19.151 - type: mrr_at_5 value: 20.322000000000003 - type: ndcg_at_1 value: 14.005 - type: ndcg_at_10 value: 25.173000000000002 - type: ndcg_at_100 value: 30.452 - type: ndcg_at_1000 value: 34.241 - type: ndcg_at_3 value: 20.768 - type: ndcg_at_5 value: 22.869 - type: precision_at_1 value: 14.005 - type: precision_at_10 value: 3.759 - type: precision_at_100 value: 0.631 - type: precision_at_1000 value: 0.095 - type: precision_at_3 value: 8.477 - type: precision_at_5 value: 6.101999999999999 - type: recall_at_1 value: 14.005 - type: recall_at_10 value: 37.592 - type: recall_at_100 value: 63.144999999999996 - type: recall_at_1000 value: 94.513 - type: recall_at_3 value: 25.430000000000003 - type: recall_at_5 value: 30.508000000000003 - task: type: PairClassification dataset: type: GEM/opusparcus name: MTEB OpusparcusPC (fr) config: fr split: test revision: 9e9b1f8ef51616073f47f306f7f47dd91663f86a metrics: - type: cos_sim_accuracy value: 81.60762942779292 - type: cos_sim_ap value: 93.33850264444463 - type: cos_sim_f1 value: 87.24705882352941 - type: cos_sim_precision value: 82.91592128801432 - type: cos_sim_recall value: 92.05561072492551 - type: dot_accuracy value: 81.60762942779292 - type: dot_ap value: 93.33850264444463 - type: dot_f1 value: 87.24705882352941 - type: dot_precision value: 82.91592128801432 - type: dot_recall value: 92.05561072492551 - type: euclidean_accuracy value: 81.60762942779292 - type: euclidean_ap value: 93.3384939260791 - type: euclidean_f1 value: 87.24705882352941 - type: euclidean_precision value: 82.91592128801432 - type: euclidean_recall value: 92.05561072492551 - type: manhattan_accuracy value: 81.60762942779292 - type: manhattan_ap value: 93.27064794794664 - type: manhattan_f1 value: 87.27440999537251 - type: manhattan_precision value: 81.7157712305026 - type: manhattan_recall value: 93.64448857994041 - type: max_accuracy value: 81.60762942779292 - type: max_ap value: 93.33850264444463 - type: max_f1 value: 87.27440999537251 - task: type: PairClassification dataset: type: paws-x name: MTEB PawsX (fr) config: fr split: test revision: 8a04d940a42cd40658986fdd8e3da561533a3646 metrics: - type: cos_sim_accuracy value: 61.95 - type: cos_sim_ap value: 60.8497942066519 - type: cos_sim_f1 value: 62.53032928942807 - type: cos_sim_precision value: 45.50958627648839 - type: cos_sim_recall value: 99.88925802879291 - type: dot_accuracy value: 61.95 - type: dot_ap value: 60.83772617132806 - type: dot_f1 value: 62.53032928942807 - type: dot_precision value: 45.50958627648839 - type: dot_recall value: 99.88925802879291 - type: euclidean_accuracy value: 61.95 - type: euclidean_ap value: 60.8497942066519 - type: euclidean_f1 value: 62.53032928942807 - type: euclidean_precision value: 45.50958627648839 - type: euclidean_recall value: 99.88925802879291 - type: manhattan_accuracy value: 61.9 - type: manhattan_ap value: 60.87914286416435 - type: manhattan_f1 value: 62.491349480968864 - type: manhattan_precision value: 45.44539506794162 - type: manhattan_recall value: 100.0 - type: max_accuracy value: 61.95 - type: max_ap value: 60.87914286416435 - type: max_f1 value: 62.53032928942807 - task: type: STS dataset: type: Lajavaness/SICK-fr name: MTEB SICKFr config: default split: test revision: e077ab4cf4774a1e36d86d593b150422fafd8e8a metrics: - type: cos_sim_pearson value: 81.24400370393097 - type: cos_sim_spearman value: 75.50548831172674 - type: euclidean_pearson value: 77.81039134726188 - type: euclidean_spearman value: 75.50504199480463 - type: manhattan_pearson value: 77.79383923445839 - type: manhattan_spearman value: 75.472882776806 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (fr) config: fr split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_pearson value: 80.48474973785514 - type: cos_sim_spearman value: 81.69566405041475 - type: euclidean_pearson value: 78.32784472269549 - type: euclidean_spearman value: 81.69566405041475 - type: manhattan_pearson value: 78.2856100079857 - type: manhattan_spearman value: 81.84463256785325 - task: type: STS dataset: type: PhilipMay/stsb_multi_mt name: MTEB STSBenchmarkMultilingualSTS (fr) config: fr split: test revision: 93d57ef91790589e3ce9c365164337a8a78b7632 metrics: - type: cos_sim_pearson value: 80.68785966129913 - type: cos_sim_spearman value: 81.29936344904975 - type: euclidean_pearson value: 80.25462090186443 - type: euclidean_spearman value: 81.29928746010391 - type: manhattan_pearson value: 80.17083094559602 - type: manhattan_spearman value: 81.18921827402406 - task: type: Summarization dataset: type: lyon-nlp/summarization-summeval-fr-p2p name: MTEB SummEvalFr config: default split: test revision: b385812de6a9577b6f4d0f88c6a6e35395a94054 metrics: - type: cos_sim_pearson value: 31.66113105701837 - type: cos_sim_spearman value: 30.13316633681715 - type: dot_pearson value: 31.66113064418324 - type: dot_spearman value: 30.13316633681715 - task: type: Reranking dataset: type: lyon-nlp/mteb-fr-reranking-syntec-s2p name: MTEB SyntecReranking config: default split: test revision: b205c5084a0934ce8af14338bf03feb19499c84d metrics: - type: map value: 85.43333333333334 - type: mrr value: 85.43333333333334 - task: type: Retrieval dataset: type: lyon-nlp/mteb-fr-retrieval-syntec-s2p name: MTEB SyntecRetrieval config: default split: test revision: aa460cd4d177e6a3c04fcd2affd95e8243289033 metrics: - type: map_at_1 value: 65.0 - type: map_at_10 value: 75.19200000000001 - type: map_at_100 value: 75.77000000000001 - type: map_at_1000 value: 75.77000000000001 - type: map_at_3 value: 73.667 - type: map_at_5 value: 75.067 - type: mrr_at_1 value: 65.0 - type: mrr_at_10 value: 75.19200000000001 - type: mrr_at_100 value: 75.77000000000001 - type: mrr_at_1000 value: 75.77000000000001 - type: mrr_at_3 value: 73.667 - type: mrr_at_5 value: 75.067 - type: ndcg_at_1 value: 65.0 - type: ndcg_at_10 value: 79.145 - type: ndcg_at_100 value: 81.34400000000001 - type: ndcg_at_1000 value: 81.34400000000001 - type: ndcg_at_3 value: 76.333 - type: ndcg_at_5 value: 78.82900000000001 - type: precision_at_1 value: 65.0 - type: precision_at_10 value: 9.1 - type: precision_at_100 value: 1.0 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 28.000000000000004 - type: precision_at_5 value: 18.0 - type: recall_at_1 value: 65.0 - type: recall_at_10 value: 91.0 - type: recall_at_100 value: 100.0 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 84.0 - type: recall_at_5 value: 90.0 - task: type: Retrieval dataset: type: jinaai/xpqa name: MTEB XPQARetrieval (fr) config: fr split: test revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f metrics: - type: map_at_1 value: 40.225 - type: map_at_10 value: 61.833000000000006 - type: map_at_100 value: 63.20400000000001 - type: map_at_1000 value: 63.27 - type: map_at_3 value: 55.593 - type: map_at_5 value: 59.65200000000001 - type: mrr_at_1 value: 63.284 - type: mrr_at_10 value: 71.351 - type: mrr_at_100 value: 71.772 - type: mrr_at_1000 value: 71.786 - type: mrr_at_3 value: 69.381 - type: mrr_at_5 value: 70.703 - type: ndcg_at_1 value: 63.284 - type: ndcg_at_10 value: 68.49199999999999 - type: ndcg_at_100 value: 72.79299999999999 - type: ndcg_at_1000 value: 73.735 - type: ndcg_at_3 value: 63.278 - type: ndcg_at_5 value: 65.19200000000001 - type: precision_at_1 value: 63.284 - type: precision_at_10 value: 15.661 - type: precision_at_100 value: 1.9349999999999998 - type: precision_at_1000 value: 0.207 - type: precision_at_3 value: 38.273 - type: precision_at_5 value: 27.397 - type: recall_at_1 value: 40.225 - type: recall_at_10 value: 77.66999999999999 - type: recall_at_100 value: 93.887 - type: recall_at_1000 value: 99.70599999999999 - type: recall_at_3 value: 61.133 - type: recall_at_5 value: 69.789 --- # {MODEL_NAME} This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 1024 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed: ``` pip install -U sentence-transformers ``` Then you can use the model like this: ```python from sentence_transformers import SentenceTransformer sentences = ["This is an example sentence", "Each sentence is converted"] model = SentenceTransformer('{MODEL_NAME}') embeddings = model.encode(sentences) print(embeddings) ``` ## Evaluation Results For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME}) ## Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Citing & Authors