bge-m3-custom-fr / README.md
manu's picture
Update README.md
ed3ef88 verified
---
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- mteb
model-index:
- name: bge-m3-custom-fr
results:
- task:
type: Clustering
dataset:
type: lyon-nlp/alloprof
name: MTEB AlloProfClusteringP2P
config: default
split: test
revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b
metrics:
- type: v_measure
value: 56.727459716713
- task:
type: Clustering
dataset:
type: lyon-nlp/alloprof
name: MTEB AlloProfClusteringS2S
config: default
split: test
revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b
metrics:
- type: v_measure
value: 38.19920006179227
- task:
type: Reranking
dataset:
type: lyon-nlp/mteb-fr-reranking-alloprof-s2p
name: MTEB AlloprofReranking
config: default
split: test
revision: e40c8a63ce02da43200eccb5b0846fcaa888f562
metrics:
- type: map
value: 65.17465797499942
- type: mrr
value: 66.51400197384653
- task:
type: Retrieval
dataset:
type: lyon-nlp/alloprof
name: MTEB AlloprofRetrieval
config: default
split: test
revision: 2df7bee4080bedf2e97de3da6bd5c7bc9fc9c4d2
metrics:
- type: map_at_1
value: 29.836000000000002
- type: map_at_10
value: 39.916000000000004
- type: map_at_100
value: 40.816
- type: map_at_1000
value: 40.877
- type: map_at_3
value: 37.294
- type: map_at_5
value: 38.838
- type: mrr_at_1
value: 29.836000000000002
- type: mrr_at_10
value: 39.916000000000004
- type: mrr_at_100
value: 40.816
- type: mrr_at_1000
value: 40.877
- type: mrr_at_3
value: 37.294
- type: mrr_at_5
value: 38.838
- type: ndcg_at_1
value: 29.836000000000002
- type: ndcg_at_10
value: 45.097
- type: ndcg_at_100
value: 49.683
- type: ndcg_at_1000
value: 51.429
- type: ndcg_at_3
value: 39.717
- type: ndcg_at_5
value: 42.501
- type: precision_at_1
value: 29.836000000000002
- type: precision_at_10
value: 6.149
- type: precision_at_100
value: 0.8340000000000001
- type: precision_at_1000
value: 0.097
- type: precision_at_3
value: 15.576
- type: precision_at_5
value: 10.698
- type: recall_at_1
value: 29.836000000000002
- type: recall_at_10
value: 61.485
- type: recall_at_100
value: 83.428
- type: recall_at_1000
value: 97.461
- type: recall_at_3
value: 46.727000000000004
- type: recall_at_5
value: 53.489
- task:
type: Classification
dataset:
type: mteb/amazon_reviews_multi
name: MTEB AmazonReviewsClassification (fr)
config: fr
split: test
revision: 1399c76144fd37290681b995c656ef9b2e06e26d
metrics:
- type: accuracy
value: 42.332
- type: f1
value: 40.801800929404344
- task:
type: Retrieval
dataset:
type: maastrichtlawtech/bsard
name: MTEB BSARDRetrieval
config: default
split: test
revision: 5effa1b9b5fa3b0f9e12523e6e43e5f86a6e6d59
metrics:
- type: map_at_1
value: 0.0
- type: map_at_10
value: 0.0
- type: map_at_100
value: 0.011000000000000001
- type: map_at_1000
value: 0.018000000000000002
- type: map_at_3
value: 0.0
- type: map_at_5
value: 0.0
- type: mrr_at_1
value: 0.0
- type: mrr_at_10
value: 0.0
- type: mrr_at_100
value: 0.011000000000000001
- type: mrr_at_1000
value: 0.018000000000000002
- type: mrr_at_3
value: 0.0
- type: mrr_at_5
value: 0.0
- type: ndcg_at_1
value: 0.0
- type: ndcg_at_10
value: 0.0
- type: ndcg_at_100
value: 0.13999999999999999
- type: ndcg_at_1000
value: 0.457
- type: ndcg_at_3
value: 0.0
- type: ndcg_at_5
value: 0.0
- type: precision_at_1
value: 0.0
- type: precision_at_10
value: 0.0
- type: precision_at_100
value: 0.009000000000000001
- type: precision_at_1000
value: 0.004
- type: precision_at_3
value: 0.0
- type: precision_at_5
value: 0.0
- type: recall_at_1
value: 0.0
- type: recall_at_10
value: 0.0
- type: recall_at_100
value: 0.901
- type: recall_at_1000
value: 3.604
- type: recall_at_3
value: 0.0
- type: recall_at_5
value: 0.0
- task:
type: Clustering
dataset:
type: lyon-nlp/clustering-hal-s2s
name: MTEB HALClusteringS2S
config: default
split: test
revision: e06ebbbb123f8144bef1a5d18796f3dec9ae2915
metrics:
- type: v_measure
value: 24.1294565929144
- task:
type: Clustering
dataset:
type: mlsum
name: MTEB MLSUMClusteringP2P
config: default
split: test
revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7
metrics:
- type: v_measure
value: 42.12040762356958
- task:
type: Clustering
dataset:
type: mlsum
name: MTEB MLSUMClusteringS2S
config: default
split: test
revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7
metrics:
- type: v_measure
value: 36.69102548662494
- task:
type: Classification
dataset:
type: mteb/mtop_domain
name: MTEB MTOPDomainClassification (fr)
config: fr
split: test
revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
metrics:
- type: accuracy
value: 90.3946132164109
- type: f1
value: 90.15608090764273
- task:
type: Classification
dataset:
type: mteb/mtop_intent
name: MTEB MTOPIntentClassification (fr)
config: fr
split: test
revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
metrics:
- type: accuracy
value: 60.87691825869088
- type: f1
value: 43.56160799721332
- task:
type: Classification
dataset:
type: masakhane/masakhanews
name: MTEB MasakhaNEWSClassification (fra)
config: fra
split: test
revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60
metrics:
- type: accuracy
value: 70.52132701421802
- type: f1
value: 66.7911493789742
- task:
type: Clustering
dataset:
type: masakhane/masakhanews
name: MTEB MasakhaNEWSClusteringP2P (fra)
config: fra
split: test
revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60
metrics:
- type: v_measure
value: 34.60975901092521
- task:
type: Clustering
dataset:
type: masakhane/masakhanews
name: MTEB MasakhaNEWSClusteringS2S (fra)
config: fra
split: test
revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60
metrics:
- type: v_measure
value: 32.8092912406207
- task:
type: Classification
dataset:
type: mteb/amazon_massive_intent
name: MTEB MassiveIntentClassification (fr)
config: fr
split: test
revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
metrics:
- type: accuracy
value: 66.70477471418964
- type: f1
value: 64.4848306188641
- task:
type: Classification
dataset:
type: mteb/amazon_massive_scenario
name: MTEB MassiveScenarioClassification (fr)
config: fr
split: test
revision: 7d571f92784cd94a019292a1f45445077d0ef634
metrics:
- type: accuracy
value: 74.57969065232011
- type: f1
value: 73.58251655418402
- task:
type: Retrieval
dataset:
type: jinaai/mintakaqa
name: MTEB MintakaRetrieval (fr)
config: fr
split: test
revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e
metrics:
- type: map_at_1
value: 14.005
- type: map_at_10
value: 21.279999999999998
- type: map_at_100
value: 22.288
- type: map_at_1000
value: 22.404
- type: map_at_3
value: 19.151
- type: map_at_5
value: 20.322000000000003
- type: mrr_at_1
value: 14.005
- type: mrr_at_10
value: 21.279999999999998
- type: mrr_at_100
value: 22.288
- type: mrr_at_1000
value: 22.404
- type: mrr_at_3
value: 19.151
- type: mrr_at_5
value: 20.322000000000003
- type: ndcg_at_1
value: 14.005
- type: ndcg_at_10
value: 25.173000000000002
- type: ndcg_at_100
value: 30.452
- type: ndcg_at_1000
value: 34.241
- type: ndcg_at_3
value: 20.768
- type: ndcg_at_5
value: 22.869
- type: precision_at_1
value: 14.005
- type: precision_at_10
value: 3.759
- type: precision_at_100
value: 0.631
- type: precision_at_1000
value: 0.095
- type: precision_at_3
value: 8.477
- type: precision_at_5
value: 6.101999999999999
- type: recall_at_1
value: 14.005
- type: recall_at_10
value: 37.592
- type: recall_at_100
value: 63.144999999999996
- type: recall_at_1000
value: 94.513
- type: recall_at_3
value: 25.430000000000003
- type: recall_at_5
value: 30.508000000000003
- task:
type: PairClassification
dataset:
type: GEM/opusparcus
name: MTEB OpusparcusPC (fr)
config: fr
split: test
revision: 9e9b1f8ef51616073f47f306f7f47dd91663f86a
metrics:
- type: cos_sim_accuracy
value: 81.60762942779292
- type: cos_sim_ap
value: 93.33850264444463
- type: cos_sim_f1
value: 87.24705882352941
- type: cos_sim_precision
value: 82.91592128801432
- type: cos_sim_recall
value: 92.05561072492551
- type: dot_accuracy
value: 81.60762942779292
- type: dot_ap
value: 93.33850264444463
- type: dot_f1
value: 87.24705882352941
- type: dot_precision
value: 82.91592128801432
- type: dot_recall
value: 92.05561072492551
- type: euclidean_accuracy
value: 81.60762942779292
- type: euclidean_ap
value: 93.3384939260791
- type: euclidean_f1
value: 87.24705882352941
- type: euclidean_precision
value: 82.91592128801432
- type: euclidean_recall
value: 92.05561072492551
- type: manhattan_accuracy
value: 81.60762942779292
- type: manhattan_ap
value: 93.27064794794664
- type: manhattan_f1
value: 87.27440999537251
- type: manhattan_precision
value: 81.7157712305026
- type: manhattan_recall
value: 93.64448857994041
- type: max_accuracy
value: 81.60762942779292
- type: max_ap
value: 93.33850264444463
- type: max_f1
value: 87.27440999537251
- task:
type: PairClassification
dataset:
type: paws-x
name: MTEB PawsX (fr)
config: fr
split: test
revision: 8a04d940a42cd40658986fdd8e3da561533a3646
metrics:
- type: cos_sim_accuracy
value: 61.95
- type: cos_sim_ap
value: 60.8497942066519
- type: cos_sim_f1
value: 62.53032928942807
- type: cos_sim_precision
value: 45.50958627648839
- type: cos_sim_recall
value: 99.88925802879291
- type: dot_accuracy
value: 61.95
- type: dot_ap
value: 60.83772617132806
- type: dot_f1
value: 62.53032928942807
- type: dot_precision
value: 45.50958627648839
- type: dot_recall
value: 99.88925802879291
- type: euclidean_accuracy
value: 61.95
- type: euclidean_ap
value: 60.8497942066519
- type: euclidean_f1
value: 62.53032928942807
- type: euclidean_precision
value: 45.50958627648839
- type: euclidean_recall
value: 99.88925802879291
- type: manhattan_accuracy
value: 61.9
- type: manhattan_ap
value: 60.87914286416435
- type: manhattan_f1
value: 62.491349480968864
- type: manhattan_precision
value: 45.44539506794162
- type: manhattan_recall
value: 100.0
- type: max_accuracy
value: 61.95
- type: max_ap
value: 60.87914286416435
- type: max_f1
value: 62.53032928942807
- task:
type: STS
dataset:
type: Lajavaness/SICK-fr
name: MTEB SICKFr
config: default
split: test
revision: e077ab4cf4774a1e36d86d593b150422fafd8e8a
metrics:
- type: cos_sim_pearson
value: 81.24400370393097
- type: cos_sim_spearman
value: 75.50548831172674
- type: euclidean_pearson
value: 77.81039134726188
- type: euclidean_spearman
value: 75.50504199480463
- type: manhattan_pearson
value: 77.79383923445839
- type: manhattan_spearman
value: 75.472882776806
- task:
type: STS
dataset:
type: mteb/sts22-crosslingual-sts
name: MTEB STS22 (fr)
config: fr
split: test
revision: eea2b4fe26a775864c896887d910b76a8098ad3f
metrics:
- type: cos_sim_pearson
value: 80.48474973785514
- type: cos_sim_spearman
value: 81.69566405041475
- type: euclidean_pearson
value: 78.32784472269549
- type: euclidean_spearman
value: 81.69566405041475
- type: manhattan_pearson
value: 78.2856100079857
- type: manhattan_spearman
value: 81.84463256785325
- task:
type: STS
dataset:
type: PhilipMay/stsb_multi_mt
name: MTEB STSBenchmarkMultilingualSTS (fr)
config: fr
split: test
revision: 93d57ef91790589e3ce9c365164337a8a78b7632
metrics:
- type: cos_sim_pearson
value: 80.68785966129913
- type: cos_sim_spearman
value: 81.29936344904975
- type: euclidean_pearson
value: 80.25462090186443
- type: euclidean_spearman
value: 81.29928746010391
- type: manhattan_pearson
value: 80.17083094559602
- type: manhattan_spearman
value: 81.18921827402406
- task:
type: Summarization
dataset:
type: lyon-nlp/summarization-summeval-fr-p2p
name: MTEB SummEvalFr
config: default
split: test
revision: b385812de6a9577b6f4d0f88c6a6e35395a94054
metrics:
- type: cos_sim_pearson
value: 31.66113105701837
- type: cos_sim_spearman
value: 30.13316633681715
- type: dot_pearson
value: 31.66113064418324
- type: dot_spearman
value: 30.13316633681715
- task:
type: Reranking
dataset:
type: lyon-nlp/mteb-fr-reranking-syntec-s2p
name: MTEB SyntecReranking
config: default
split: test
revision: b205c5084a0934ce8af14338bf03feb19499c84d
metrics:
- type: map
value: 85.43333333333334
- type: mrr
value: 85.43333333333334
- task:
type: Retrieval
dataset:
type: lyon-nlp/mteb-fr-retrieval-syntec-s2p
name: MTEB SyntecRetrieval
config: default
split: test
revision: aa460cd4d177e6a3c04fcd2affd95e8243289033
metrics:
- type: map_at_1
value: 65.0
- type: map_at_10
value: 75.19200000000001
- type: map_at_100
value: 75.77000000000001
- type: map_at_1000
value: 75.77000000000001
- type: map_at_3
value: 73.667
- type: map_at_5
value: 75.067
- type: mrr_at_1
value: 65.0
- type: mrr_at_10
value: 75.19200000000001
- type: mrr_at_100
value: 75.77000000000001
- type: mrr_at_1000
value: 75.77000000000001
- type: mrr_at_3
value: 73.667
- type: mrr_at_5
value: 75.067
- type: ndcg_at_1
value: 65.0
- type: ndcg_at_10
value: 79.145
- type: ndcg_at_100
value: 81.34400000000001
- type: ndcg_at_1000
value: 81.34400000000001
- type: ndcg_at_3
value: 76.333
- type: ndcg_at_5
value: 78.82900000000001
- type: precision_at_1
value: 65.0
- type: precision_at_10
value: 9.1
- type: precision_at_100
value: 1.0
- type: precision_at_1000
value: 0.1
- type: precision_at_3
value: 28.000000000000004
- type: precision_at_5
value: 18.0
- type: recall_at_1
value: 65.0
- type: recall_at_10
value: 91.0
- type: recall_at_100
value: 100.0
- type: recall_at_1000
value: 100.0
- type: recall_at_3
value: 84.0
- type: recall_at_5
value: 90.0
- task:
type: Retrieval
dataset:
type: jinaai/xpqa
name: MTEB XPQARetrieval (fr)
config: fr
split: test
revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f
metrics:
- type: map_at_1
value: 40.225
- type: map_at_10
value: 61.833000000000006
- type: map_at_100
value: 63.20400000000001
- type: map_at_1000
value: 63.27
- type: map_at_3
value: 55.593
- type: map_at_5
value: 59.65200000000001
- type: mrr_at_1
value: 63.284
- type: mrr_at_10
value: 71.351
- type: mrr_at_100
value: 71.772
- type: mrr_at_1000
value: 71.786
- type: mrr_at_3
value: 69.381
- type: mrr_at_5
value: 70.703
- type: ndcg_at_1
value: 63.284
- type: ndcg_at_10
value: 68.49199999999999
- type: ndcg_at_100
value: 72.79299999999999
- type: ndcg_at_1000
value: 73.735
- type: ndcg_at_3
value: 63.278
- type: ndcg_at_5
value: 65.19200000000001
- type: precision_at_1
value: 63.284
- type: precision_at_10
value: 15.661
- type: precision_at_100
value: 1.9349999999999998
- type: precision_at_1000
value: 0.207
- type: precision_at_3
value: 38.273
- type: precision_at_5
value: 27.397
- type: recall_at_1
value: 40.225
- type: recall_at_10
value: 77.66999999999999
- type: recall_at_100
value: 93.887
- type: recall_at_1000
value: 99.70599999999999
- type: recall_at_3
value: 61.133
- type: recall_at_5
value: 69.789
---
# {MODEL_NAME}
This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 1024 dimensional dense vector space and can be used for tasks like clustering or semantic search.
<!--- Describe your model here -->
## Usage (Sentence-Transformers)
Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
```
pip install -U sentence-transformers
```
Then you can use the model like this:
```python
from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]
model = SentenceTransformer('{MODEL_NAME}')
embeddings = model.encode(sentences)
print(embeddings)
```
## Evaluation Results
<!--- Describe how your model was evaluated -->
For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
## Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Citing & Authors
<!--- Describe where people can find more information -->