dangvantuan's picture
Update README.md
40ac503 verified
---
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- transformers
- sentence-embedding
- mteb
model-index:
- name: bilingual-document-embedding
results:
- task:
type: Clustering
dataset:
type: lyon-nlp/alloprof
name: MTEB AlloProfClusteringP2P
config: default
split: test
revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b
metrics:
- type: v_measure
value: 55.52298673909706
- type: v_measures
value: [0.5198748380785404, 0.5562521099012603, 0.5322986254464575, 0.5722250987615152, 0.532932258758668]
- task:
type: Clustering
dataset:
type: lyon-nlp/alloprof
name: MTEB AlloProfClusteringS2S
config: default
split: test
revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b
metrics:
- type: v_measure
value: 35.802733348353094
- type: v_measures
value: [0.37359796790048144, 0.36376421464272285, 0.37524966704915225, 0.3749296797757371, 0.36673700158106576]
- task:
type: Reranking
dataset:
type: lyon-nlp/mteb-fr-reranking-alloprof-s2p
name: MTEB AlloprofReranking
config: default
split: test
revision: 65393d0d7a08a10b4e348135e824f385d420b0fd
metrics:
- type: map
value: 73.10088493122083
- type: mrr
value: 74.33452929243086
- type: nAUC_map_diff1
value: 56.63750231223696
- type: nAUC_map_max
value: 27.066268470355492
- type: nAUC_mrr_diff1
value: 55.33487252773409
- type: nAUC_mrr_max
value: 27.328424865584367
- task:
type: Retrieval
dataset:
type: lyon-nlp/alloprof
name: MTEB AlloprofRetrieval
config: default
split: test
revision: fcf295ea64c750f41fadbaa37b9b861558e1bfbd
metrics:
- type: map_at_1
value: 28.282
- type: map_at_10
value: 38.805
- type: map_at_100
value: 39.804
- type: map_at_1000
value: 39.859
- type: map_at_20
value: 39.428999999999995
- type: map_at_3
value: 35.838
- type: map_at_5
value: 37.537
- type: mrr_at_1
value: 28.281519861830745
- type: mrr_at_10
value: 38.805171752063
- type: mrr_at_100
value: 39.80444636265999
- type: mrr_at_1000
value: 39.858722309770464
- type: mrr_at_20
value: 39.42867574383368
- type: mrr_at_3
value: 35.83765112262528
- type: mrr_at_5
value: 37.53670120898111
- type: nauc_map_at_1000_diff1
value: 42.44333601195352
- type: nauc_map_at_1000_max
value: 41.88361927698375
- type: nauc_map_at_100_diff1
value: 42.42746715522874
- type: nauc_map_at_100_max
value: 41.913701611267015
- type: nauc_map_at_10_diff1
value: 42.25094726032311
- type: nauc_map_at_10_max
value: 41.76772459035808
- type: nauc_map_at_1_diff1
value: 48.03355307282109
- type: nauc_map_at_1_max
value: 38.73226070718987
- type: nauc_map_at_20_diff1
value: 42.3550770875435
- type: nauc_map_at_20_max
value: 41.89957212013687
- type: nauc_map_at_3_diff1
value: 42.88695727955848
- type: nauc_map_at_3_max
value: 40.81262402287836
- type: nauc_map_at_5_diff1
value: 42.34041989483334
- type: nauc_map_at_5_max
value: 41.36458206255729
- type: nauc_mrr_at_1000_diff1
value: 42.44333601195352
- type: nauc_mrr_at_1000_max
value: 41.88361927698375
- type: nauc_mrr_at_100_diff1
value: 42.42746715522874
- type: nauc_mrr_at_100_max
value: 41.913701611267015
- type: nauc_mrr_at_10_diff1
value: 42.25094726032311
- type: nauc_mrr_at_10_max
value: 41.76772459035808
- type: nauc_mrr_at_1_diff1
value: 48.03355307282109
- type: nauc_mrr_at_1_max
value: 38.73226070718987
- type: nauc_mrr_at_20_diff1
value: 42.3550770875435
- type: nauc_mrr_at_20_max
value: 41.89957212013687
- type: nauc_mrr_at_3_diff1
value: 42.88695727955848
- type: nauc_mrr_at_3_max
value: 40.81262402287836
- type: nauc_mrr_at_5_diff1
value: 42.34041989483334
- type: nauc_mrr_at_5_max
value: 41.36458206255729
- type: nauc_ndcg_at_1000_diff1
value: 41.35830258715452
- type: nauc_ndcg_at_1000_max
value: 43.2765379475269
- type: nauc_ndcg_at_100_diff1
value: 40.95047094384412
- type: nauc_ndcg_at_100_max
value: 44.293436979483594
- type: nauc_ndcg_at_10_diff1
value: 40.0359339979518
- type: nauc_ndcg_at_10_max
value: 43.390909520520076
- type: nauc_ndcg_at_1_diff1
value: 48.03355307282109
- type: nauc_ndcg_at_1_max
value: 38.73226070718987
- type: nauc_ndcg_at_20_diff1
value: 40.35056898575259
- type: nauc_ndcg_at_20_max
value: 43.991764985610345
- type: nauc_ndcg_at_3_diff1
value: 41.40129960980627
- type: nauc_ndcg_at_3_max
value: 41.41104483378663
- type: nauc_ndcg_at_5_diff1
value: 40.36007384476364
- type: nauc_ndcg_at_5_max
value: 42.383481303106414
- type: nauc_precision_at_1000_diff1
value: 34.36657255351115
- type: nauc_precision_at_1000_max
value: 74.91976431868189
- type: nauc_precision_at_100_diff1
value: 33.55702830592739
- type: nauc_precision_at_100_max
value: 65.71347416493107
- type: nauc_precision_at_10_diff1
value: 32.521448032129
- type: nauc_precision_at_10_max
value: 49.19930788953475
- type: nauc_precision_at_1_diff1
value: 48.03355307282109
- type: nauc_precision_at_1_max
value: 38.73226070718987
- type: nauc_precision_at_20_diff1
value: 32.57892299703891
- type: nauc_precision_at_20_max
value: 53.45967162017302
- type: nauc_precision_at_3_diff1
value: 37.249551957650795
- type: nauc_precision_at_3_max
value: 43.08267504682664
- type: nauc_precision_at_5_diff1
value: 34.44393985129692
- type: nauc_precision_at_5_max
value: 45.460096642832646
- type: nauc_recall_at_1000_diff1
value: 34.36657255350965
- type: nauc_recall_at_1000_max
value: 74.91976431868211
- type: nauc_recall_at_100_diff1
value: 33.55702830592741
- type: nauc_recall_at_100_max
value: 65.71347416493116
- type: nauc_recall_at_10_diff1
value: 32.52144803212901
- type: nauc_recall_at_10_max
value: 49.199307889534715
- type: nauc_recall_at_1_diff1
value: 48.03355307282109
- type: nauc_recall_at_1_max
value: 38.73226070718987
- type: nauc_recall_at_20_diff1
value: 32.57892299703892
- type: nauc_recall_at_20_max
value: 53.45967162017302
- type: nauc_recall_at_3_diff1
value: 37.249551957650766
- type: nauc_recall_at_3_max
value: 43.082675046826644
- type: nauc_recall_at_5_diff1
value: 34.4439398512969
- type: nauc_recall_at_5_max
value: 45.460096642832674
- type: ndcg_at_1
value: 28.282
- type: ndcg_at_10
value: 44.421
- type: ndcg_at_100
value: 49.447
- type: ndcg_at_1000
value: 50.981
- type: ndcg_at_20
value: 46.671
- type: ndcg_at_3
value: 38.289
- type: ndcg_at_5
value: 41.349999999999994
- type: precision_at_1
value: 28.282
- type: precision_at_10
value: 6.231
- type: precision_at_100
value: 0.8619999999999999
- type: precision_at_1000
value: 0.098
- type: precision_at_20
value: 3.558
- type: precision_at_3
value: 15.126999999999999
- type: precision_at_5
value: 10.561
- type: recall_at_1
value: 28.282
- type: recall_at_10
value: 62.306
- type: recall_at_100
value: 86.226
- type: recall_at_1000
value: 98.489
- type: recall_at_20
value: 71.15700000000001
- type: recall_at_3
value: 45.379999999999995
- type: recall_at_5
value: 52.807
- task:
type: Classification
dataset:
type: mteb/amazon_reviews_multi
name: MTEB AmazonReviewsClassification (fr)
config: fr
split: test
revision: 1399c76144fd37290681b995c656ef9b2e06e26d
metrics:
- type: accuracy
value: 44.10999999999999
- type: f1
value: 42.00584553745547
- type: f1_weighted
value: 42.005845537455485
- task:
type: Retrieval
dataset:
type: maastrichtlawtech/bsard
name: MTEB BSARDRetrieval
config: default
split: test
revision: 5effa1b9b5fa3b0f9e12523e6e43e5f86a6e6d59
metrics:
- type: map_at_1
value: 4.955
- type: map_at_10
value: 9.103
- type: map_at_100
value: 9.998999999999999
- type: map_at_1000
value: 10.136000000000001
- type: map_at_20
value: 9.554
- type: map_at_3
value: 7.4319999999999995
- type: map_at_5
value: 7.95
- type: mrr_at_1
value: 4.954954954954955
- type: mrr_at_10
value: 9.102852852852852
- type: mrr_at_100
value: 9.999215850941926
- type: mrr_at_1000
value: 10.13616308946331
- type: mrr_at_20
value: 9.554402632963003
- type: mrr_at_3
value: 7.432432432432434
- type: mrr_at_5
value: 7.950450450450451
- type: nauc_map_at_1000_diff1
value: 14.655819915811785
- type: nauc_map_at_1000_max
value: 9.188182207979008
- type: nauc_map_at_100_diff1
value: 14.517637755979687
- type: nauc_map_at_100_max
value: 9.060725563022503
- type: nauc_map_at_10_diff1
value: 15.776144582905358
- type: nauc_map_at_10_max
value: 9.448668398689462
- type: nauc_map_at_1_diff1
value: 19.10921794840591
- type: nauc_map_at_1_max
value: 4.060331068810239
- type: nauc_map_at_20_diff1
value: 15.061809327427353
- type: nauc_map_at_20_max
value: 9.085953657690329
- type: nauc_map_at_3_diff1
value: 18.42793018906856
- type: nauc_map_at_3_max
value: 10.10140103912974
- type: nauc_map_at_5_diff1
value: 17.407972669931233
- type: nauc_map_at_5_max
value: 10.064885264376228
- type: nauc_mrr_at_1000_diff1
value: 14.655819915811785
- type: nauc_mrr_at_1000_max
value: 9.188182207979008
- type: nauc_mrr_at_100_diff1
value: 14.517637755979687
- type: nauc_mrr_at_100_max
value: 9.060725563022503
- type: nauc_mrr_at_10_diff1
value: 15.776144582905358
- type: nauc_mrr_at_10_max
value: 9.448668398689462
- type: nauc_mrr_at_1_diff1
value: 19.10921794840591
- type: nauc_mrr_at_1_max
value: 4.060331068810239
- type: nauc_mrr_at_20_diff1
value: 15.061809327427353
- type: nauc_mrr_at_20_max
value: 9.085953657690329
- type: nauc_mrr_at_3_diff1
value: 18.42793018906856
- type: nauc_mrr_at_3_max
value: 10.10140103912974
- type: nauc_mrr_at_5_diff1
value: 17.407972669931233
- type: nauc_mrr_at_5_max
value: 10.064885264376228
- type: nauc_ndcg_at_1000_diff1
value: 11.940580725648152
- type: nauc_ndcg_at_1000_max
value: 11.004283166102807
- type: nauc_ndcg_at_100_diff1
value: 10.009680762933215
- type: nauc_ndcg_at_100_max
value: 8.444186642393188
- type: nauc_ndcg_at_10_diff1
value: 14.423251037136561
- type: nauc_ndcg_at_10_max
value: 10.614014795363303
- type: nauc_ndcg_at_1_diff1
value: 19.10921794840591
- type: nauc_ndcg_at_1_max
value: 4.060331068810239
- type: nauc_ndcg_at_20_diff1
value: 12.486198272876521
- type: nauc_ndcg_at_20_max
value: 9.550225653436467
- type: nauc_ndcg_at_3_diff1
value: 18.813915768129757
- type: nauc_ndcg_at_3_max
value: 11.865670858870484
- type: nauc_ndcg_at_5_diff1
value: 17.01715479783127
- type: nauc_ndcg_at_5_max
value: 11.523181173967899
- type: nauc_precision_at_1000_diff1
value: 6.162580085242911
- type: nauc_precision_at_1000_max
value: 21.74545120171883
- type: nauc_precision_at_100_diff1
value: 1.4492186570094137
- type: nauc_precision_at_100_max
value: 5.320582161712451
- type: nauc_precision_at_10_diff1
value: 12.199838986983115
- type: nauc_precision_at_10_max
value: 12.409471572004998
- type: nauc_precision_at_1_diff1
value: 19.10921794840591
- type: nauc_precision_at_1_max
value: 4.060331068810239
- type: nauc_precision_at_20_diff1
value: 8.089525252638769
- type: nauc_precision_at_20_max
value: 9.829600854870332
- type: nauc_precision_at_3_diff1
value: 19.71630962813128
- type: nauc_precision_at_3_max
value: 15.560242379569136
- type: nauc_precision_at_5_diff1
value: 16.151579517326258
- type: nauc_precision_at_5_max
value: 14.225120177799683
- type: nauc_recall_at_1000_diff1
value: 6.1625800852429595
- type: nauc_recall_at_1000_max
value: 21.745451201718687
- type: nauc_recall_at_100_diff1
value: 1.4492186570093863
- type: nauc_recall_at_100_max
value: 5.320582161712405
- type: nauc_recall_at_10_diff1
value: 12.199838986983083
- type: nauc_recall_at_10_max
value: 12.409471572004962
- type: nauc_recall_at_1_diff1
value: 19.10921794840591
- type: nauc_recall_at_1_max
value: 4.060331068810239
- type: nauc_recall_at_20_diff1
value: 8.089525252638692
- type: nauc_recall_at_20_max
value: 9.829600854870273
- type: nauc_recall_at_3_diff1
value: 19.716309628131278
- type: nauc_recall_at_3_max
value: 15.560242379569129
- type: nauc_recall_at_5_diff1
value: 16.151579517326265
- type: nauc_recall_at_5_max
value: 14.225120177799697
- type: ndcg_at_1
value: 4.955
- type: ndcg_at_10
value: 12.005
- type: ndcg_at_100
value: 17.238
- type: ndcg_at_1000
value: 21.287
- type: ndcg_at_20
value: 13.691999999999998
- type: ndcg_at_3
value: 8.296000000000001
- type: ndcg_at_5
value: 9.225999999999999
- type: precision_at_1
value: 4.955
- type: precision_at_10
value: 2.162
- type: precision_at_100
value: 0.482
- type: precision_at_1000
value: 0.08099999999999999
- type: precision_at_20
value: 1.419
- type: precision_at_3
value: 3.604
- type: precision_at_5
value: 2.613
- type: recall_at_1
value: 4.955
- type: recall_at_10
value: 21.622
- type: recall_at_100
value: 48.198
- type: recall_at_1000
value: 81.081
- type: recall_at_20
value: 28.377999999999997
- type: recall_at_3
value: 10.811
- type: recall_at_5
value: 13.062999999999999
- task:
type: Clustering
dataset:
type: lyon-nlp/clustering-hal-s2s
name: MTEB HALClusteringS2S
config: default
split: test
revision: e06ebbbb123f8144bef1a5d18796f3dec9ae2915
metrics:
- type: v_measure
value: 23.137623974622365
- type: v_measures
value: [0.2802068838665942, 0.2565274984774815, 0.25245022445056786, 0.22595460950575297, 0.20177741591393913]
- task:
type: Clustering
dataset:
type: reciTAL/mlsum
name: MTEB MLSUMClusteringP2P
config: default
split: test
revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7
metrics:
- type: v_measure
value: 40.31003279146524
- type: v_measures
value: [0.3943651322813771, 0.4189000344922205, 0.4101443880670743, 0.3832149080991847, 0.37602613534689566]
- task:
type: Clustering
dataset:
type: reciTAL/mlsum
name: MTEB MLSUMClusteringS2S
config: default
split: test
revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7
metrics:
- type: v_measure
value: 40.04524841336757
- type: v_measures
value: [0.39835449199860185, 0.405905613221237, 0.40326782414397255, 0.40882879348632284, 0.3683302592759367]
- task:
type: Classification
dataset:
type: mteb/mtop_domain
name: MTEB MTOPDomainClassification (fr)
config: fr
split: test
revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
metrics:
- type: accuracy
value: 87.82023175696837
- type: f1
value: 87.58287510797385
- type: f1_weighted
value: 87.75645870762435
- task:
type: Classification
dataset:
type: mteb/mtop_intent
name: MTEB MTOPIntentClassification (fr)
config: fr
split: test
revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
metrics:
- type: accuracy
value: 58.628249295333546
- type: f1
value: 42.22070573172825
- type: f1_weighted
value: 60.62087995743649
- task:
type: Classification
dataset:
type: mteb/masakhanews
name: MTEB MasakhaNEWSClassification (fra)
config: fra
split: test
revision: 18193f187b92da67168c655c9973a165ed9593dd
metrics:
- type: accuracy
value: 69.81042654028435
- type: f1
value: 66.05811881796396
- type: f1_weighted
value: 70.34901566149948
- task:
type: Clustering
dataset:
type: masakhane/masakhanews
name: MTEB MasakhaNEWSClusteringP2P (fra)
config: fra
split: test
revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60
metrics:
- type: v_measure
value: 45.02712178986078
- type: v_measures
value: [1.0, 0.23955793240111928, 0.7158920010774062, 0.036391635653837, 0.25951452036067674]
- task:
type: Clustering
dataset:
type: masakhane/masakhanews
name: MTEB MasakhaNEWSClusteringS2S (fra)
config: fra
split: test
revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60
metrics:
- type: v_measure
value: 30.38607254306223
- type: v_measures
value: [1.0, 0.01936507478006705, 0.19876372667844472, 0.17182595867380823, 0.12934886702079137]
- task:
type: Classification
dataset:
type: mteb/amazon_massive_intent
name: MTEB MassiveIntentClassification (fr)
config: fr
split: test
revision: 4672e20407010da34463acc759c162ca9734bca6
metrics:
- type: accuracy
value: 66.13651647612645
- type: f1
value: 64.42898347709598
- type: f1_weighted
value: 65.01442547020224
- task:
type: Classification
dataset:
type: mteb/amazon_massive_scenario
name: MTEB MassiveScenarioClassification (fr)
config: fr
split: test
revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8
metrics:
- type: accuracy
value: 72.73705447209144
- type: f1
value: 72.09285609231057
- type: f1_weighted
value: 72.34295244611339
- task:
type: Retrieval
dataset:
type: jinaai/mintakaqa
name: MTEB MintakaRetrieval (fr)
config: fr
split: test
revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e
metrics:
- type: map_at_1
value: 13.677
- type: map_at_10
value: 21.044
- type: map_at_100
value: 22.012
- type: map_at_1000
value: 22.125
- type: map_at_20
value: 21.573999999999998
- type: map_at_3
value: 18.857
- type: map_at_5
value: 19.936999999999998
- type: mrr_at_1
value: 13.677313677313677
- type: mrr_at_10
value: 21.043933543933505
- type: mrr_at_100
value: 22.012160523798318
- type: mrr_at_1000
value: 22.124555014776913
- type: mrr_at_20
value: 21.574199922074904
- type: mrr_at_3
value: 18.857493857493825
- type: mrr_at_5
value: 19.93652743652738
- type: nauc_map_at_1000_diff1
value: 20.63087633823352
- type: nauc_map_at_1000_max
value: 31.753246807516362
- type: nauc_map_at_100_diff1
value: 20.602874174259885
- type: nauc_map_at_100_max
value: 31.74109681792161
- type: nauc_map_at_10_diff1
value: 20.82028964049537
- type: nauc_map_at_10_max
value: 32.082751313883705
- type: nauc_map_at_1_diff1
value: 27.838566854973656
- type: nauc_map_at_1_max
value: 32.0217755083183
- type: nauc_map_at_20_diff1
value: 20.685607874578192
- type: nauc_map_at_20_max
value: 31.89765440964895
- type: nauc_map_at_3_diff1
value: 22.385335765437958
- type: nauc_map_at_3_max
value: 32.47346568889047
- type: nauc_map_at_5_diff1
value: 21.173253596770003
- type: nauc_map_at_5_max
value: 32.2528418460596
- type: nauc_mrr_at_1000_diff1
value: 20.63087633823352
- type: nauc_mrr_at_1000_max
value: 31.753246807516362
- type: nauc_mrr_at_100_diff1
value: 20.602874174259885
- type: nauc_mrr_at_100_max
value: 31.74109681792161
- type: nauc_mrr_at_10_diff1
value: 20.82028964049537
- type: nauc_mrr_at_10_max
value: 32.082751313883705
- type: nauc_mrr_at_1_diff1
value: 27.838566854973656
- type: nauc_mrr_at_1_max
value: 32.0217755083183
- type: nauc_mrr_at_20_diff1
value: 20.685607874578192
- type: nauc_mrr_at_20_max
value: 31.89765440964895
- type: nauc_mrr_at_3_diff1
value: 22.385335765437958
- type: nauc_mrr_at_3_max
value: 32.47346568889047
- type: nauc_mrr_at_5_diff1
value: 21.173253596770003
- type: nauc_mrr_at_5_max
value: 32.2528418460596
- type: nauc_ndcg_at_1000_diff1
value: 18.08460876388022
- type: nauc_ndcg_at_1000_max
value: 30.282810360048217
- type: nauc_ndcg_at_100_diff1
value: 17.119539175602068
- type: nauc_ndcg_at_100_max
value: 29.66409825853174
- type: nauc_ndcg_at_10_diff1
value: 18.23254548133648
- type: nauc_ndcg_at_10_max
value: 31.52995550586078
- type: nauc_ndcg_at_1_diff1
value: 27.838566854973656
- type: nauc_ndcg_at_1_max
value: 32.0217755083183
- type: nauc_ndcg_at_20_diff1
value: 17.769003159911446
- type: nauc_ndcg_at_20_max
value: 30.929703630445033
- type: nauc_ndcg_at_3_diff1
value: 20.96979719261237
- type: nauc_ndcg_at_3_max
value: 32.363993132409526
- type: nauc_ndcg_at_5_diff1
value: 19.00106027591966
- type: nauc_ndcg_at_5_max
value: 31.962682994281664
- type: nauc_precision_at_1000_diff1
value: -0.439767274118902
- type: nauc_precision_at_1000_max
value: 12.247737195943136
- type: nauc_precision_at_100_diff1
value: 5.574224743755663
- type: nauc_precision_at_100_max
value: 20.625486141114006
- type: nauc_precision_at_10_diff1
value: 12.116438700823444
- type: nauc_precision_at_10_max
value: 30.027073824365324
- type: nauc_precision_at_1_diff1
value: 27.838566854973656
- type: nauc_precision_at_1_max
value: 32.0217755083183
- type: nauc_precision_at_20_diff1
value: 10.528730914479825
- type: nauc_precision_at_20_max
value: 28.101643683820228
- type: nauc_precision_at_3_diff1
value: 17.575083081784413
- type: nauc_precision_at_3_max
value: 32.04257948042897
- type: nauc_precision_at_5_diff1
value: 13.87097676219356
- type: nauc_precision_at_5_max
value: 31.186621554981798
- type: nauc_recall_at_1000_diff1
value: -0.4397672741187951
- type: nauc_recall_at_1000_max
value: 12.247737195943454
- type: nauc_recall_at_100_diff1
value: 5.574224743755691
- type: nauc_recall_at_100_max
value: 20.625486141114028
- type: nauc_recall_at_10_diff1
value: 12.116438700823482
- type: nauc_recall_at_10_max
value: 30.027073824365335
- type: nauc_recall_at_1_diff1
value: 27.838566854973656
- type: nauc_recall_at_1_max
value: 32.0217755083183
- type: nauc_recall_at_20_diff1
value: 10.528730914479794
- type: nauc_recall_at_20_max
value: 28.101643683820228
- type: nauc_recall_at_3_diff1
value: 17.57508308178443
- type: nauc_recall_at_3_max
value: 32.042579480429
- type: nauc_recall_at_5_diff1
value: 13.870976762193543
- type: nauc_recall_at_5_max
value: 31.186621554981787
- type: ndcg_at_1
value: 13.677
- type: ndcg_at_10
value: 25.191000000000003
- type: ndcg_at_100
value: 30.379
- type: ndcg_at_1000
value: 33.961999999999996
- type: ndcg_at_20
value: 27.1
- type: ndcg_at_3
value: 20.546
- type: ndcg_at_5
value: 22.505
- type: precision_at_1
value: 13.677
- type: precision_at_10
value: 3.853
- type: precision_at_100
value: 0.639
- type: precision_at_1000
value: 0.093
- type: precision_at_20
value: 2.3009999999999997
- type: precision_at_3
value: 8.477
- type: precision_at_5
value: 6.0440000000000005
- type: recall_at_1
value: 13.677
- type: recall_at_10
value: 38.534
- type: recall_at_100
value: 63.922999999999995
- type: recall_at_1000
value: 93.407
- type: recall_at_20
value: 46.028000000000006
- type: recall_at_3
value: 25.430000000000003
- type: recall_at_5
value: 30.220999999999997
- task:
type: PairClassification
dataset:
type: GEM/opusparcus
name: MTEB OpusparcusPC (fr)
config: fr
split: test
revision: 9e9b1f8ef51616073f47f306f7f47dd91663f86a
metrics:
- type: cos_sim_accuracy
value: 82.9700272479564
- type: cos_sim_ap
value: 93.15021785539084
- type: cos_sim_f1
value: 87.97316722568279
- type: cos_sim_precision
value: 85.0
- type: cos_sim_recall
value: 91.16186693147964
- type: dot_accuracy
value: 82.9700272479564
- type: dot_ap
value: 93.15021785539084
- type: dot_f1
value: 87.97316722568279
- type: dot_precision
value: 85.0
- type: dot_recall
value: 91.16186693147964
- type: euclidean_accuracy
value: 82.9700272479564
- type: euclidean_ap
value: 93.15015081441638
- type: euclidean_f1
value: 87.97316722568279
- type: euclidean_precision
value: 85.0
- type: euclidean_recall
value: 91.16186693147964
- type: manhattan_accuracy
value: 82.56130790190735
- type: manhattan_ap
value: 93.14590481820592
- type: manhattan_f1
value: 87.86729857819905
- type: manhattan_precision
value: 84.04351767905711
- type: manhattan_recall
value: 92.05561072492552
- type: max_accuracy
value: 82.9700272479564
- type: max_ap
value: 93.15021785539084
- type: max_f1
value: 87.97316722568279
- task:
type: PairClassification
dataset:
type: google-research-datasets/paws-x
name: MTEB PawsX (fr)
config: fr
split: test
revision: 8a04d940a42cd40658986fdd8e3da561533a3646
metrics:
- type: cos_sim_accuracy
value: 64.14999999999999
- type: cos_sim_ap
value: 63.43794001840604
- type: cos_sim_f1
value: 62.59187620889749
- type: cos_sim_precision
value: 48.097502972651604
- type: cos_sim_recall
value: 89.59025470653378
- type: dot_accuracy
value: 64.14999999999999
- type: dot_ap
value: 63.52400235031554
- type: dot_f1
value: 62.59187620889749
- type: dot_precision
value: 48.097502972651604
- type: dot_recall
value: 89.59025470653378
- type: euclidean_accuracy
value: 64.14999999999999
- type: euclidean_ap
value: 63.43794001840604
- type: euclidean_f1
value: 62.59187620889749
- type: euclidean_precision
value: 48.097502972651604
- type: euclidean_recall
value: 89.59025470653378
- type: manhattan_accuracy
value: 64.2
- type: manhattan_ap
value: 63.46163243480347
- type: manhattan_f1
value: 62.540021344717175
- type: manhattan_precision
value: 46.069182389937104
- type: manhattan_recall
value: 97.34219269102991
- type: max_accuracy
value: 64.2
- type: max_ap
value: 63.52400235031554
- type: max_f1
value: 62.59187620889749
- task:
type: STS
dataset:
type: Lajavaness/SICK-fr
name: MTEB SICKFr
config: default
split: test
revision: e077ab4cf4774a1e36d86d593b150422fafd8e8a
metrics:
- type: cos_sim_pearson
value: 85.12347242597652
- type: cos_sim_spearman
value: 79.80580538857501
- type: euclidean_pearson
value: 82.03127787921382
- type: euclidean_spearman
value: 79.80580538857501
- type: manhattan_pearson
value: 82.02795155003601
- type: manhattan_spearman
value: 79.7808784011127
- task:
type: STS
dataset:
type: mteb/sts22-crosslingual-sts
name: MTEB STS22 (fr)
config: fr
split: test
revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3
metrics:
- type: cos_sim_pearson
value: 82.34462624659417
- type: cos_sim_spearman
value: 82.83867899462683
- type: euclidean_pearson
value: 80.00679113308384
- type: euclidean_spearman
value: 82.83867899462683
- type: manhattan_pearson
value: 79.97582730301362
- type: manhattan_spearman
value: 82.95718926500541
- task:
type: STS
dataset:
type: mteb/stsb_multi_mt
name: MTEB STSBenchmarkMultilingualSTS (fr)
config: fr
split: test
revision: 29afa2569dcedaaa2fe6a3dcfebab33d28b82e8c
metrics:
- type: cos_sim_pearson
value: 86.0897698618904
- type: cos_sim_spearman
value: 86.58814894229229
- type: euclidean_pearson
value: 85.53992615842806
- type: euclidean_spearman
value: 86.58814894229229
- type: manhattan_pearson
value: 85.4985023034774
- type: manhattan_spearman
value: 86.50239881298486
- task:
type: Summarization
dataset:
type: lyon-nlp/summarization-summeval-fr-p2p
name: MTEB SummEvalFr
config: default
split: test
revision: b385812de6a9577b6f4d0f88c6a6e35395a94054
metrics:
- type: cos_sim_pearson
value: 30.458145110977753
- type: cos_sim_spearman
value: 31.624715940109265
- type: dot_pearson
value: 30.458145236239915
- type: dot_spearman
value: 31.624715940109265
- task:
type: Reranking
dataset:
type: lyon-nlp/mteb-fr-reranking-syntec-s2p
name: MTEB SyntecReranking
config: default
split: test
revision: daf0863838cd9e3ba50544cdce3ac2b338a1b0ad
metrics:
- type: map
value: 87.60277777777777
- type: mrr
value: 87.60277777777777
- type: nAUC_map_diff1
value: 63.877496103879814
- type: nAUC_map_max
value: -4.8943605546581725
- type: nAUC_mrr_diff1
value: 63.877496103879814
- type: nAUC_mrr_max
value: -4.8943605546581725
- task:
type: Retrieval
dataset:
type: lyon-nlp/mteb-fr-retrieval-syntec-s2p
name: MTEB SyntecRetrieval
config: default
split: test
revision: 19661ccdca4dfc2d15122d776b61685f48c68ca9
metrics:
- type: map_at_1
value: 67.0
- type: map_at_10
value: 78.47800000000001
- type: map_at_100
value: 78.616
- type: map_at_1000
value: 78.616
- type: map_at_20
value: 78.52799999999999
- type: map_at_3
value: 77.833
- type: map_at_5
value: 78.033
- type: mrr_at_1
value: 67.0
- type: mrr_at_10
value: 78.47777777777777
- type: mrr_at_100
value: 78.61609758846066
- type: mrr_at_1000
value: 78.61609758846066
- type: mrr_at_20
value: 78.52777777777777
- type: mrr_at_3
value: 77.83333333333331
- type: mrr_at_5
value: 78.03333333333333
- type: nauc_map_at_1000_diff1
value: 54.76919250379753
- type: nauc_map_at_1000_max
value: 24.03294042759147
- type: nauc_map_at_100_diff1
value: 54.76919250379753
- type: nauc_map_at_100_max
value: 24.03294042759147
- type: nauc_map_at_10_diff1
value: 54.781660658782506
- type: nauc_map_at_10_max
value: 24.45332707633837
- type: nauc_map_at_1_diff1
value: 54.48189466912695
- type: nauc_map_at_1_max
value: 17.502666282947597
- type: nauc_map_at_20_diff1
value: 54.69518355408933
- type: nauc_map_at_20_max
value: 24.285263763068183
- type: nauc_map_at_3_diff1
value: 54.98928575752318
- type: nauc_map_at_3_max
value: 25.252117626643916
- type: nauc_map_at_5_diff1
value: 54.51750311747391
- type: nauc_map_at_5_max
value: 25.141479081321766
- type: nauc_mrr_at_1000_diff1
value: 54.76919250379753
- type: nauc_mrr_at_1000_max
value: 24.03294042759147
- type: nauc_mrr_at_100_diff1
value: 54.76919250379753
- type: nauc_mrr_at_100_max
value: 24.03294042759147
- type: nauc_mrr_at_10_diff1
value: 54.781660658782506
- type: nauc_mrr_at_10_max
value: 24.45332707633837
- type: nauc_mrr_at_1_diff1
value: 54.48189466912695
- type: nauc_mrr_at_1_max
value: 17.502666282947597
- type: nauc_mrr_at_20_diff1
value: 54.69518355408933
- type: nauc_mrr_at_20_max
value: 24.285263763068183
- type: nauc_mrr_at_3_diff1
value: 54.98928575752318
- type: nauc_mrr_at_3_max
value: 25.252117626643916
- type: nauc_mrr_at_5_diff1
value: 54.51750311747391
- type: nauc_mrr_at_5_max
value: 25.141479081321766
- type: nauc_ndcg_at_1000_diff1
value: 54.411394691026196
- type: nauc_ndcg_at_1000_max
value: 25.003182969921014
- type: nauc_ndcg_at_100_diff1
value: 54.411394691026196
- type: nauc_ndcg_at_100_max
value: 25.003182969921014
- type: nauc_ndcg_at_10_diff1
value: 53.97509194326736
- type: nauc_ndcg_at_10_max
value: 27.51736442048005
- type: nauc_ndcg_at_1_diff1
value: 54.48189466912695
- type: nauc_ndcg_at_1_max
value: 17.502666282947597
- type: nauc_ndcg_at_20_diff1
value: 53.46713794714154
- type: nauc_ndcg_at_20_max
value: 26.601577957753005
- type: nauc_ndcg_at_3_diff1
value: 54.521393171396525
- type: nauc_ndcg_at_3_max
value: 29.07380139412928
- type: nauc_ndcg_at_5_diff1
value: 53.42255297135452
- type: nauc_ndcg_at_5_max
value: 28.91110004742623
- type: nauc_precision_at_1000_diff1
value: nan
- type: nauc_precision_at_1000_max
value: nan
- type: nauc_precision_at_100_diff1
value: nan
- type: nauc_precision_at_100_max
value: nan
- type: nauc_precision_at_10_diff1
value: 41.59663865546228
- type: nauc_precision_at_10_max
value: 67.44864612511667
- type: nauc_precision_at_1_diff1
value: 54.48189466912695
- type: nauc_precision_at_1_max
value: 17.502666282947597
- type: nauc_precision_at_20_diff1
value: 26.486150015561265
- type: nauc_precision_at_20_max
value: 60.95549330843449
- type: nauc_precision_at_3_diff1
value: 50.78781512605074
- type: nauc_precision_at_3_max
value: 55.48552754435131
- type: nauc_precision_at_5_diff1
value: 43.75750300120062
- type: nauc_precision_at_5_max
value: 58.29665199413101
- type: nauc_recall_at_1000_diff1
value: nan
- type: nauc_recall_at_1000_max
value: nan
- type: nauc_recall_at_100_diff1
value: nan
- type: nauc_recall_at_100_max
value: nan
- type: nauc_recall_at_10_diff1
value: 41.59663865546242
- type: nauc_recall_at_10_max
value: 67.44864612511677
- type: nauc_recall_at_1_diff1
value: 54.48189466912695
- type: nauc_recall_at_1_max
value: 17.502666282947597
- type: nauc_recall_at_20_diff1
value: 26.486150015561737
- type: nauc_recall_at_20_max
value: 60.95549330843472
- type: nauc_recall_at_3_diff1
value: 50.787815126050376
- type: nauc_recall_at_3_max
value: 55.48552754435111
- type: nauc_recall_at_5_diff1
value: 43.75750300120054
- type: nauc_recall_at_5_max
value: 58.29665199413113
- type: ndcg_at_1
value: 67.0
- type: ndcg_at_10
value: 82.864
- type: ndcg_at_100
value: 83.672
- type: ndcg_at_1000
value: 83.672
- type: ndcg_at_20
value: 83.092
- type: ndcg_at_3
value: 81.464
- type: ndcg_at_5
value: 81.851
- type: precision_at_1
value: 67.0
- type: precision_at_10
value: 9.6
- type: precision_at_100
value: 1.0
- type: precision_at_1000
value: 0.1
- type: precision_at_20
value: 4.8500000000000005
- type: precision_at_3
value: 30.667
- type: precision_at_5
value: 18.6
- type: recall_at_1
value: 67.0
- type: recall_at_10
value: 96.0
- type: recall_at_100
value: 100.0
- type: recall_at_1000
value: 100.0
- type: recall_at_20
value: 97.0
- type: recall_at_3
value: 92.0
- type: recall_at_5
value: 93.0
- task:
type: Retrieval
dataset:
type: jinaai/xpqa
name: MTEB XPQARetrieval (fr)
config: fr
split: test
revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f
metrics:
- type: map_at_1
value: 40.038000000000004
- type: map_at_10
value: 62.409000000000006
- type: map_at_100
value: 63.63999999999999
- type: map_at_1000
value: 63.693
- type: map_at_20
value: 63.165000000000006
- type: map_at_3
value: 55.364999999999995
- type: map_at_5
value: 59.95399999999999
- type: mrr_at_1
value: 62.88384512683578
- type: mrr_at_10
value: 70.414944794117
- type: mrr_at_100
value: 70.85679259651413
- type: mrr_at_1000
value: 70.8680806615119
- type: mrr_at_20
value: 70.69824986774621
- type: mrr_at_3
value: 68.04628393413438
- type: mrr_at_5
value: 69.65509568313303
- type: nauc_map_at_1000_diff1
value: 47.58306966781138
- type: nauc_map_at_1000_max
value: 49.99853404950863
- type: nauc_map_at_100_diff1
value: 47.5473544905194
- type: nauc_map_at_100_max
value: 49.98683021023155
- type: nauc_map_at_10_diff1
value: 47.443327641163705
- type: nauc_map_at_10_max
value: 49.31862257934493
- type: nauc_map_at_1_diff1
value: 55.93203426614159
- type: nauc_map_at_1_max
value: 27.467436111704224
- type: nauc_map_at_20_diff1
value: 47.454162467793985
- type: nauc_map_at_20_max
value: 49.715459382963765
- type: nauc_map_at_3_diff1
value: 48.910525378486874
- type: nauc_map_at_3_max
value: 42.13319318718595
- type: nauc_map_at_5_diff1
value: 48.56545403298638
- type: nauc_map_at_5_max
value: 47.311811085622445
- type: nauc_mrr_at_1000_diff1
value: 56.739822956274224
- type: nauc_mrr_at_1000_max
value: 58.274212468278854
- type: nauc_mrr_at_100_diff1
value: 56.7308210328899
- type: nauc_mrr_at_100_max
value: 58.27250671019899
- type: nauc_mrr_at_10_diff1
value: 56.647228471816405
- type: nauc_mrr_at_10_max
value: 58.210342990657495
- type: nauc_mrr_at_1_diff1
value: 58.618266167104046
- type: nauc_mrr_at_1_max
value: 58.55438607166539
- type: nauc_mrr_at_20_diff1
value: 56.63534799976597
- type: nauc_mrr_at_20_max
value: 58.17181317797869
- type: nauc_mrr_at_3_diff1
value: 56.815531582264825
- type: nauc_mrr_at_3_max
value: 58.32821204695344
- type: nauc_mrr_at_5_diff1
value: 56.79122022985127
- type: nauc_mrr_at_5_max
value: 58.20366609452701
- type: nauc_ndcg_at_1000_diff1
value: 49.530062263932194
- type: nauc_ndcg_at_1000_max
value: 53.473298956705925
- type: nauc_ndcg_at_100_diff1
value: 48.95703823297219
- type: nauc_ndcg_at_100_max
value: 53.191721124797276
- type: nauc_ndcg_at_10_diff1
value: 47.98530786084638
- type: nauc_ndcg_at_10_max
value: 51.155857323188016
- type: nauc_ndcg_at_1_diff1
value: 58.618266167104046
- type: nauc_ndcg_at_1_max
value: 58.55438607166539
- type: nauc_ndcg_at_20_diff1
value: 47.95544792051313
- type: nauc_ndcg_at_20_max
value: 51.751640167194054
- type: nauc_ndcg_at_3_diff1
value: 48.50900656884395
- type: nauc_ndcg_at_3_max
value: 50.78667595293348
- type: nauc_ndcg_at_5_diff1
value: 49.496100926859654
- type: nauc_ndcg_at_5_max
value: 49.089893886856835
- type: nauc_precision_at_1000_diff1
value: -19.085707327488784
- type: nauc_precision_at_1000_max
value: 22.16522736611267
- type: nauc_precision_at_100_diff1
value: -16.92930793417545
- type: nauc_precision_at_100_max
value: 26.119556898620655
- type: nauc_precision_at_10_diff1
value: -8.586758571265364
- type: nauc_precision_at_10_max
value: 34.29909350105018
- type: nauc_precision_at_1_diff1
value: 58.618266167104046
- type: nauc_precision_at_1_max
value: 58.55438607166539
- type: nauc_precision_at_20_diff1
value: -12.36545815755639
- type: nauc_precision_at_20_max
value: 30.779202784243694
- type: nauc_precision_at_3_diff1
value: 7.173290556095678
- type: nauc_precision_at_3_max
value: 43.244915594569356
- type: nauc_precision_at_5_diff1
value: -0.5308831428158323
- type: nauc_precision_at_5_max
value: 39.78478615216909
- type: nauc_recall_at_1000_diff1
value: 44.67738158424653
- type: nauc_recall_at_1000_max
value: 71.12276250795361
- type: nauc_recall_at_100_diff1
value: 30.071917991701135
- type: nauc_recall_at_100_max
value: 42.226214389979326
- type: nauc_recall_at_10_diff1
value: 36.275167481806804
- type: nauc_recall_at_10_max
value: 40.16796727800884
- type: nauc_recall_at_1_diff1
value: 55.93203426614159
- type: nauc_recall_at_1_max
value: 27.467436111704224
- type: nauc_recall_at_20_diff1
value: 32.189427460851505
- type: nauc_recall_at_20_max
value: 38.926081167758205
- type: nauc_recall_at_3_diff1
value: 43.959378195689894
- type: nauc_recall_at_3_max
value: 36.441633750156335
- type: nauc_recall_at_5_diff1
value: 42.6274479464408
- type: nauc_recall_at_5_max
value: 38.9902118898862
- type: ndcg_at_1
value: 62.88399999999999
- type: ndcg_at_10
value: 68.907
- type: ndcg_at_100
value: 72.896
- type: ndcg_at_1000
value: 73.721
- type: ndcg_at_20
value: 70.738
- type: ndcg_at_3
value: 62.731
- type: ndcg_at_5
value: 65.191
- type: precision_at_1
value: 62.88399999999999
- type: precision_at_10
value: 16.101
- type: precision_at_100
value: 1.951
- type: precision_at_1000
value: 0.20600000000000002
- type: precision_at_20
value: 8.705
- type: precision_at_3
value: 38.095
- type: precision_at_5
value: 27.904
- type: recall_at_1
value: 40.038000000000004
- type: recall_at_10
value: 79.237
- type: recall_at_100
value: 94.17699999999999
- type: recall_at_1000
value: 99.466
- type: recall_at_20
value: 85.027
- type: recall_at_3
value: 60.336
- type: recall_at_5
value: 70.122
license: apache-2.0
language:
- fr
metrics:
- pearsonr
- spearmanr
---
# [bilingual-document-embedding](https://huggingface.co/Lajavaness/bilingual-document-embedding)
bilingual-document-embedding is the Embedding Model for document in bilingual language: french and english with context length up to 8096 tokens . This model is a specialized sentence-embedding trained specifically for the bilingual language, leveraging the robust capabilities of [BGE M3](https://huggingface.co/BAAI/bge-m3), a pre-trained language model larged on the [BGE M3](https://huggingface.co/BAAI/bge-m3) architecture. The model utilizes xlm-roberta to encode english-french sentences into a 1024-dimensional vector space, facilitating a wide range of applications from semantic search to text clustering. The embeddings capture the nuanced meanings of english-french sentences, reflecting both the lexical and contextual layers of the language.
## Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BilingualModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Training and Fine-tuning process
#### Stage 1: NLI Training
- Dataset: [(SNLI+XNLI) for english+french]
- Method: Training using Multi-Negative Ranking Loss. This stage focused on improving the model's ability to discern and rank nuanced differences in sentence semantics.
### Stage 3: Continued Fine-tuning for Semantic Textual Similarity on STS Benchmark
- Dataset: [STSB-fr and en]
- Method: Fine-tuning specifically for the semantic textual similarity benchmark using Siamese BERT-Networks configured with the 'sentence-transformers' library.
### Stage 4: Advanced Augmentation Fine-tuning
- Dataset: STSB with generate [silver sample from gold sample](https://www.sbert.net/examples/training/data_augmentation/README.html)
- Method: Employed an advanced strategy using [Augmented SBERT](https://arxiv.org/abs/2010.08240) with Pair Sampling Strategies, integrating both Cross-Encoder and Bi-Encoder models. This stage further refined the embeddings by enriching the training data dynamically, enhancing the model's robustness and accuracy.
## Usage:
Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
```
pip install -U sentence-transformers
```
Then you can use the model like this:
```python
from sentence_transformers import SentenceTransformer
sentences = ["Paris est une capitale de la France", "Paris is a capital of France"]
model = SentenceTransformer('Lajavaness/bilingual-document-embedding', trust_remote_code=True)
print(embeddings)
```
## Evaluation
TODO
## Citation
@article{chen2024bge,
title={Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation},
author={Chen, Jianlv and Xiao, Shitao and Zhang, Peitian and Luo, Kun and Lian, Defu and Liu, Zheng},
journal={arXiv preprint arXiv:2402.03216},
year={2024}
}
@article{conneau2019unsupervised,
title={Unsupervised cross-lingual representation learning at scale},
author={Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm{\'a}n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin},
journal={arXiv preprint arXiv:1911.02116},
year={2019}
}
@article{reimers2019sentence,
title={Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks},
author={Nils Reimers, Iryna Gurevych},
journal={https://arxiv.org/abs/1908.10084},
year={2019}
}
@article{thakur2020augmented,
title={Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks},
author={Thakur, Nandan and Reimers, Nils and Daxenberger, Johannes and Gurevych, Iryna},
journal={arXiv e-prints},
pages={arXiv--2010},
year={2020}