dangvantuan's picture
Update README.md
40ac503 verified
metadata
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - feature-extraction
  - sentence-similarity
  - transformers
  - sentence-embedding
  - mteb
model-index:
  - name: bilingual-document-embedding
    results:
      - task:
          type: Clustering
        dataset:
          type: lyon-nlp/alloprof
          name: MTEB AlloProfClusteringP2P
          config: default
          split: test
          revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b
        metrics:
          - type: v_measure
            value: 55.52298673909706
          - type: v_measures
            value:
              - 0.5198748380785404
              - 0.5562521099012603
              - 0.5322986254464575
              - 0.5722250987615152
              - 0.532932258758668
      - task:
          type: Clustering
        dataset:
          type: lyon-nlp/alloprof
          name: MTEB AlloProfClusteringS2S
          config: default
          split: test
          revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b
        metrics:
          - type: v_measure
            value: 35.802733348353094
          - type: v_measures
            value:
              - 0.37359796790048144
              - 0.36376421464272285
              - 0.37524966704915225
              - 0.3749296797757371
              - 0.36673700158106576
      - task:
          type: Reranking
        dataset:
          type: lyon-nlp/mteb-fr-reranking-alloprof-s2p
          name: MTEB AlloprofReranking
          config: default
          split: test
          revision: 65393d0d7a08a10b4e348135e824f385d420b0fd
        metrics:
          - type: map
            value: 73.10088493122083
          - type: mrr
            value: 74.33452929243086
          - type: nAUC_map_diff1
            value: 56.63750231223696
          - type: nAUC_map_max
            value: 27.066268470355492
          - type: nAUC_mrr_diff1
            value: 55.33487252773409
          - type: nAUC_mrr_max
            value: 27.328424865584367
      - task:
          type: Retrieval
        dataset:
          type: lyon-nlp/alloprof
          name: MTEB AlloprofRetrieval
          config: default
          split: test
          revision: fcf295ea64c750f41fadbaa37b9b861558e1bfbd
        metrics:
          - type: map_at_1
            value: 28.282
          - type: map_at_10
            value: 38.805
          - type: map_at_100
            value: 39.804
          - type: map_at_1000
            value: 39.859
          - type: map_at_20
            value: 39.428999999999995
          - type: map_at_3
            value: 35.838
          - type: map_at_5
            value: 37.537
          - type: mrr_at_1
            value: 28.281519861830745
          - type: mrr_at_10
            value: 38.805171752063
          - type: mrr_at_100
            value: 39.80444636265999
          - type: mrr_at_1000
            value: 39.858722309770464
          - type: mrr_at_20
            value: 39.42867574383368
          - type: mrr_at_3
            value: 35.83765112262528
          - type: mrr_at_5
            value: 37.53670120898111
          - type: nauc_map_at_1000_diff1
            value: 42.44333601195352
          - type: nauc_map_at_1000_max
            value: 41.88361927698375
          - type: nauc_map_at_100_diff1
            value: 42.42746715522874
          - type: nauc_map_at_100_max
            value: 41.913701611267015
          - type: nauc_map_at_10_diff1
            value: 42.25094726032311
          - type: nauc_map_at_10_max
            value: 41.76772459035808
          - type: nauc_map_at_1_diff1
            value: 48.03355307282109
          - type: nauc_map_at_1_max
            value: 38.73226070718987
          - type: nauc_map_at_20_diff1
            value: 42.3550770875435
          - type: nauc_map_at_20_max
            value: 41.89957212013687
          - type: nauc_map_at_3_diff1
            value: 42.88695727955848
          - type: nauc_map_at_3_max
            value: 40.81262402287836
          - type: nauc_map_at_5_diff1
            value: 42.34041989483334
          - type: nauc_map_at_5_max
            value: 41.36458206255729
          - type: nauc_mrr_at_1000_diff1
            value: 42.44333601195352
          - type: nauc_mrr_at_1000_max
            value: 41.88361927698375
          - type: nauc_mrr_at_100_diff1
            value: 42.42746715522874
          - type: nauc_mrr_at_100_max
            value: 41.913701611267015
          - type: nauc_mrr_at_10_diff1
            value: 42.25094726032311
          - type: nauc_mrr_at_10_max
            value: 41.76772459035808
          - type: nauc_mrr_at_1_diff1
            value: 48.03355307282109
          - type: nauc_mrr_at_1_max
            value: 38.73226070718987
          - type: nauc_mrr_at_20_diff1
            value: 42.3550770875435
          - type: nauc_mrr_at_20_max
            value: 41.89957212013687
          - type: nauc_mrr_at_3_diff1
            value: 42.88695727955848
          - type: nauc_mrr_at_3_max
            value: 40.81262402287836
          - type: nauc_mrr_at_5_diff1
            value: 42.34041989483334
          - type: nauc_mrr_at_5_max
            value: 41.36458206255729
          - type: nauc_ndcg_at_1000_diff1
            value: 41.35830258715452
          - type: nauc_ndcg_at_1000_max
            value: 43.2765379475269
          - type: nauc_ndcg_at_100_diff1
            value: 40.95047094384412
          - type: nauc_ndcg_at_100_max
            value: 44.293436979483594
          - type: nauc_ndcg_at_10_diff1
            value: 40.0359339979518
          - type: nauc_ndcg_at_10_max
            value: 43.390909520520076
          - type: nauc_ndcg_at_1_diff1
            value: 48.03355307282109
          - type: nauc_ndcg_at_1_max
            value: 38.73226070718987
          - type: nauc_ndcg_at_20_diff1
            value: 40.35056898575259
          - type: nauc_ndcg_at_20_max
            value: 43.991764985610345
          - type: nauc_ndcg_at_3_diff1
            value: 41.40129960980627
          - type: nauc_ndcg_at_3_max
            value: 41.41104483378663
          - type: nauc_ndcg_at_5_diff1
            value: 40.36007384476364
          - type: nauc_ndcg_at_5_max
            value: 42.383481303106414
          - type: nauc_precision_at_1000_diff1
            value: 34.36657255351115
          - type: nauc_precision_at_1000_max
            value: 74.91976431868189
          - type: nauc_precision_at_100_diff1
            value: 33.55702830592739
          - type: nauc_precision_at_100_max
            value: 65.71347416493107
          - type: nauc_precision_at_10_diff1
            value: 32.521448032129
          - type: nauc_precision_at_10_max
            value: 49.19930788953475
          - type: nauc_precision_at_1_diff1
            value: 48.03355307282109
          - type: nauc_precision_at_1_max
            value: 38.73226070718987
          - type: nauc_precision_at_20_diff1
            value: 32.57892299703891
          - type: nauc_precision_at_20_max
            value: 53.45967162017302
          - type: nauc_precision_at_3_diff1
            value: 37.249551957650795
          - type: nauc_precision_at_3_max
            value: 43.08267504682664
          - type: nauc_precision_at_5_diff1
            value: 34.44393985129692
          - type: nauc_precision_at_5_max
            value: 45.460096642832646
          - type: nauc_recall_at_1000_diff1
            value: 34.36657255350965
          - type: nauc_recall_at_1000_max
            value: 74.91976431868211
          - type: nauc_recall_at_100_diff1
            value: 33.55702830592741
          - type: nauc_recall_at_100_max
            value: 65.71347416493116
          - type: nauc_recall_at_10_diff1
            value: 32.52144803212901
          - type: nauc_recall_at_10_max
            value: 49.199307889534715
          - type: nauc_recall_at_1_diff1
            value: 48.03355307282109
          - type: nauc_recall_at_1_max
            value: 38.73226070718987
          - type: nauc_recall_at_20_diff1
            value: 32.57892299703892
          - type: nauc_recall_at_20_max
            value: 53.45967162017302
          - type: nauc_recall_at_3_diff1
            value: 37.249551957650766
          - type: nauc_recall_at_3_max
            value: 43.082675046826644
          - type: nauc_recall_at_5_diff1
            value: 34.4439398512969
          - type: nauc_recall_at_5_max
            value: 45.460096642832674
          - type: ndcg_at_1
            value: 28.282
          - type: ndcg_at_10
            value: 44.421
          - type: ndcg_at_100
            value: 49.447
          - type: ndcg_at_1000
            value: 50.981
          - type: ndcg_at_20
            value: 46.671
          - type: ndcg_at_3
            value: 38.289
          - type: ndcg_at_5
            value: 41.349999999999994
          - type: precision_at_1
            value: 28.282
          - type: precision_at_10
            value: 6.231
          - type: precision_at_100
            value: 0.8619999999999999
          - type: precision_at_1000
            value: 0.098
          - type: precision_at_20
            value: 3.558
          - type: precision_at_3
            value: 15.126999999999999
          - type: precision_at_5
            value: 10.561
          - type: recall_at_1
            value: 28.282
          - type: recall_at_10
            value: 62.306
          - type: recall_at_100
            value: 86.226
          - type: recall_at_1000
            value: 98.489
          - type: recall_at_20
            value: 71.15700000000001
          - type: recall_at_3
            value: 45.379999999999995
          - type: recall_at_5
            value: 52.807
      - task:
          type: Classification
        dataset:
          type: mteb/amazon_reviews_multi
          name: MTEB AmazonReviewsClassification (fr)
          config: fr
          split: test
          revision: 1399c76144fd37290681b995c656ef9b2e06e26d
        metrics:
          - type: accuracy
            value: 44.10999999999999
          - type: f1
            value: 42.00584553745547
          - type: f1_weighted
            value: 42.005845537455485
      - task:
          type: Retrieval
        dataset:
          type: maastrichtlawtech/bsard
          name: MTEB BSARDRetrieval
          config: default
          split: test
          revision: 5effa1b9b5fa3b0f9e12523e6e43e5f86a6e6d59
        metrics:
          - type: map_at_1
            value: 4.955
          - type: map_at_10
            value: 9.103
          - type: map_at_100
            value: 9.998999999999999
          - type: map_at_1000
            value: 10.136000000000001
          - type: map_at_20
            value: 9.554
          - type: map_at_3
            value: 7.4319999999999995
          - type: map_at_5
            value: 7.95
          - type: mrr_at_1
            value: 4.954954954954955
          - type: mrr_at_10
            value: 9.102852852852852
          - type: mrr_at_100
            value: 9.999215850941926
          - type: mrr_at_1000
            value: 10.13616308946331
          - type: mrr_at_20
            value: 9.554402632963003
          - type: mrr_at_3
            value: 7.432432432432434
          - type: mrr_at_5
            value: 7.950450450450451
          - type: nauc_map_at_1000_diff1
            value: 14.655819915811785
          - type: nauc_map_at_1000_max
            value: 9.188182207979008
          - type: nauc_map_at_100_diff1
            value: 14.517637755979687
          - type: nauc_map_at_100_max
            value: 9.060725563022503
          - type: nauc_map_at_10_diff1
            value: 15.776144582905358
          - type: nauc_map_at_10_max
            value: 9.448668398689462
          - type: nauc_map_at_1_diff1
            value: 19.10921794840591
          - type: nauc_map_at_1_max
            value: 4.060331068810239
          - type: nauc_map_at_20_diff1
            value: 15.061809327427353
          - type: nauc_map_at_20_max
            value: 9.085953657690329
          - type: nauc_map_at_3_diff1
            value: 18.42793018906856
          - type: nauc_map_at_3_max
            value: 10.10140103912974
          - type: nauc_map_at_5_diff1
            value: 17.407972669931233
          - type: nauc_map_at_5_max
            value: 10.064885264376228
          - type: nauc_mrr_at_1000_diff1
            value: 14.655819915811785
          - type: nauc_mrr_at_1000_max
            value: 9.188182207979008
          - type: nauc_mrr_at_100_diff1
            value: 14.517637755979687
          - type: nauc_mrr_at_100_max
            value: 9.060725563022503
          - type: nauc_mrr_at_10_diff1
            value: 15.776144582905358
          - type: nauc_mrr_at_10_max
            value: 9.448668398689462
          - type: nauc_mrr_at_1_diff1
            value: 19.10921794840591
          - type: nauc_mrr_at_1_max
            value: 4.060331068810239
          - type: nauc_mrr_at_20_diff1
            value: 15.061809327427353
          - type: nauc_mrr_at_20_max
            value: 9.085953657690329
          - type: nauc_mrr_at_3_diff1
            value: 18.42793018906856
          - type: nauc_mrr_at_3_max
            value: 10.10140103912974
          - type: nauc_mrr_at_5_diff1
            value: 17.407972669931233
          - type: nauc_mrr_at_5_max
            value: 10.064885264376228
          - type: nauc_ndcg_at_1000_diff1
            value: 11.940580725648152
          - type: nauc_ndcg_at_1000_max
            value: 11.004283166102807
          - type: nauc_ndcg_at_100_diff1
            value: 10.009680762933215
          - type: nauc_ndcg_at_100_max
            value: 8.444186642393188
          - type: nauc_ndcg_at_10_diff1
            value: 14.423251037136561
          - type: nauc_ndcg_at_10_max
            value: 10.614014795363303
          - type: nauc_ndcg_at_1_diff1
            value: 19.10921794840591
          - type: nauc_ndcg_at_1_max
            value: 4.060331068810239
          - type: nauc_ndcg_at_20_diff1
            value: 12.486198272876521
          - type: nauc_ndcg_at_20_max
            value: 9.550225653436467
          - type: nauc_ndcg_at_3_diff1
            value: 18.813915768129757
          - type: nauc_ndcg_at_3_max
            value: 11.865670858870484
          - type: nauc_ndcg_at_5_diff1
            value: 17.01715479783127
          - type: nauc_ndcg_at_5_max
            value: 11.523181173967899
          - type: nauc_precision_at_1000_diff1
            value: 6.162580085242911
          - type: nauc_precision_at_1000_max
            value: 21.74545120171883
          - type: nauc_precision_at_100_diff1
            value: 1.4492186570094137
          - type: nauc_precision_at_100_max
            value: 5.320582161712451
          - type: nauc_precision_at_10_diff1
            value: 12.199838986983115
          - type: nauc_precision_at_10_max
            value: 12.409471572004998
          - type: nauc_precision_at_1_diff1
            value: 19.10921794840591
          - type: nauc_precision_at_1_max
            value: 4.060331068810239
          - type: nauc_precision_at_20_diff1
            value: 8.089525252638769
          - type: nauc_precision_at_20_max
            value: 9.829600854870332
          - type: nauc_precision_at_3_diff1
            value: 19.71630962813128
          - type: nauc_precision_at_3_max
            value: 15.560242379569136
          - type: nauc_precision_at_5_diff1
            value: 16.151579517326258
          - type: nauc_precision_at_5_max
            value: 14.225120177799683
          - type: nauc_recall_at_1000_diff1
            value: 6.1625800852429595
          - type: nauc_recall_at_1000_max
            value: 21.745451201718687
          - type: nauc_recall_at_100_diff1
            value: 1.4492186570093863
          - type: nauc_recall_at_100_max
            value: 5.320582161712405
          - type: nauc_recall_at_10_diff1
            value: 12.199838986983083
          - type: nauc_recall_at_10_max
            value: 12.409471572004962
          - type: nauc_recall_at_1_diff1
            value: 19.10921794840591
          - type: nauc_recall_at_1_max
            value: 4.060331068810239
          - type: nauc_recall_at_20_diff1
            value: 8.089525252638692
          - type: nauc_recall_at_20_max
            value: 9.829600854870273
          - type: nauc_recall_at_3_diff1
            value: 19.716309628131278
          - type: nauc_recall_at_3_max
            value: 15.560242379569129
          - type: nauc_recall_at_5_diff1
            value: 16.151579517326265
          - type: nauc_recall_at_5_max
            value: 14.225120177799697
          - type: ndcg_at_1
            value: 4.955
          - type: ndcg_at_10
            value: 12.005
          - type: ndcg_at_100
            value: 17.238
          - type: ndcg_at_1000
            value: 21.287
          - type: ndcg_at_20
            value: 13.691999999999998
          - type: ndcg_at_3
            value: 8.296000000000001
          - type: ndcg_at_5
            value: 9.225999999999999
          - type: precision_at_1
            value: 4.955
          - type: precision_at_10
            value: 2.162
          - type: precision_at_100
            value: 0.482
          - type: precision_at_1000
            value: 0.08099999999999999
          - type: precision_at_20
            value: 1.419
          - type: precision_at_3
            value: 3.604
          - type: precision_at_5
            value: 2.613
          - type: recall_at_1
            value: 4.955
          - type: recall_at_10
            value: 21.622
          - type: recall_at_100
            value: 48.198
          - type: recall_at_1000
            value: 81.081
          - type: recall_at_20
            value: 28.377999999999997
          - type: recall_at_3
            value: 10.811
          - type: recall_at_5
            value: 13.062999999999999
      - task:
          type: Clustering
        dataset:
          type: lyon-nlp/clustering-hal-s2s
          name: MTEB HALClusteringS2S
          config: default
          split: test
          revision: e06ebbbb123f8144bef1a5d18796f3dec9ae2915
        metrics:
          - type: v_measure
            value: 23.137623974622365
          - type: v_measures
            value:
              - 0.2802068838665942
              - 0.2565274984774815
              - 0.25245022445056786
              - 0.22595460950575297
              - 0.20177741591393913
      - task:
          type: Clustering
        dataset:
          type: reciTAL/mlsum
          name: MTEB MLSUMClusteringP2P
          config: default
          split: test
          revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7
        metrics:
          - type: v_measure
            value: 40.31003279146524
          - type: v_measures
            value:
              - 0.3943651322813771
              - 0.4189000344922205
              - 0.4101443880670743
              - 0.3832149080991847
              - 0.37602613534689566
      - task:
          type: Clustering
        dataset:
          type: reciTAL/mlsum
          name: MTEB MLSUMClusteringS2S
          config: default
          split: test
          revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7
        metrics:
          - type: v_measure
            value: 40.04524841336757
          - type: v_measures
            value:
              - 0.39835449199860185
              - 0.405905613221237
              - 0.40326782414397255
              - 0.40882879348632284
              - 0.3683302592759367
      - task:
          type: Classification
        dataset:
          type: mteb/mtop_domain
          name: MTEB MTOPDomainClassification (fr)
          config: fr
          split: test
          revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
        metrics:
          - type: accuracy
            value: 87.82023175696837
          - type: f1
            value: 87.58287510797385
          - type: f1_weighted
            value: 87.75645870762435
      - task:
          type: Classification
        dataset:
          type: mteb/mtop_intent
          name: MTEB MTOPIntentClassification (fr)
          config: fr
          split: test
          revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
        metrics:
          - type: accuracy
            value: 58.628249295333546
          - type: f1
            value: 42.22070573172825
          - type: f1_weighted
            value: 60.62087995743649
      - task:
          type: Classification
        dataset:
          type: mteb/masakhanews
          name: MTEB MasakhaNEWSClassification (fra)
          config: fra
          split: test
          revision: 18193f187b92da67168c655c9973a165ed9593dd
        metrics:
          - type: accuracy
            value: 69.81042654028435
          - type: f1
            value: 66.05811881796396
          - type: f1_weighted
            value: 70.34901566149948
      - task:
          type: Clustering
        dataset:
          type: masakhane/masakhanews
          name: MTEB MasakhaNEWSClusteringP2P (fra)
          config: fra
          split: test
          revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60
        metrics:
          - type: v_measure
            value: 45.02712178986078
          - type: v_measures
            value:
              - 1
              - 0.23955793240111928
              - 0.7158920010774062
              - 0.036391635653837
              - 0.25951452036067674
      - task:
          type: Clustering
        dataset:
          type: masakhane/masakhanews
          name: MTEB MasakhaNEWSClusteringS2S (fra)
          config: fra
          split: test
          revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60
        metrics:
          - type: v_measure
            value: 30.38607254306223
          - type: v_measures
            value:
              - 1
              - 0.01936507478006705
              - 0.19876372667844472
              - 0.17182595867380823
              - 0.12934886702079137
      - task:
          type: Classification
        dataset:
          type: mteb/amazon_massive_intent
          name: MTEB MassiveIntentClassification (fr)
          config: fr
          split: test
          revision: 4672e20407010da34463acc759c162ca9734bca6
        metrics:
          - type: accuracy
            value: 66.13651647612645
          - type: f1
            value: 64.42898347709598
          - type: f1_weighted
            value: 65.01442547020224
      - task:
          type: Classification
        dataset:
          type: mteb/amazon_massive_scenario
          name: MTEB MassiveScenarioClassification (fr)
          config: fr
          split: test
          revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8
        metrics:
          - type: accuracy
            value: 72.73705447209144
          - type: f1
            value: 72.09285609231057
          - type: f1_weighted
            value: 72.34295244611339
      - task:
          type: Retrieval
        dataset:
          type: jinaai/mintakaqa
          name: MTEB MintakaRetrieval (fr)
          config: fr
          split: test
          revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e
        metrics:
          - type: map_at_1
            value: 13.677
          - type: map_at_10
            value: 21.044
          - type: map_at_100
            value: 22.012
          - type: map_at_1000
            value: 22.125
          - type: map_at_20
            value: 21.573999999999998
          - type: map_at_3
            value: 18.857
          - type: map_at_5
            value: 19.936999999999998
          - type: mrr_at_1
            value: 13.677313677313677
          - type: mrr_at_10
            value: 21.043933543933505
          - type: mrr_at_100
            value: 22.012160523798318
          - type: mrr_at_1000
            value: 22.124555014776913
          - type: mrr_at_20
            value: 21.574199922074904
          - type: mrr_at_3
            value: 18.857493857493825
          - type: mrr_at_5
            value: 19.93652743652738
          - type: nauc_map_at_1000_diff1
            value: 20.63087633823352
          - type: nauc_map_at_1000_max
            value: 31.753246807516362
          - type: nauc_map_at_100_diff1
            value: 20.602874174259885
          - type: nauc_map_at_100_max
            value: 31.74109681792161
          - type: nauc_map_at_10_diff1
            value: 20.82028964049537
          - type: nauc_map_at_10_max
            value: 32.082751313883705
          - type: nauc_map_at_1_diff1
            value: 27.838566854973656
          - type: nauc_map_at_1_max
            value: 32.0217755083183
          - type: nauc_map_at_20_diff1
            value: 20.685607874578192
          - type: nauc_map_at_20_max
            value: 31.89765440964895
          - type: nauc_map_at_3_diff1
            value: 22.385335765437958
          - type: nauc_map_at_3_max
            value: 32.47346568889047
          - type: nauc_map_at_5_diff1
            value: 21.173253596770003
          - type: nauc_map_at_5_max
            value: 32.2528418460596
          - type: nauc_mrr_at_1000_diff1
            value: 20.63087633823352
          - type: nauc_mrr_at_1000_max
            value: 31.753246807516362
          - type: nauc_mrr_at_100_diff1
            value: 20.602874174259885
          - type: nauc_mrr_at_100_max
            value: 31.74109681792161
          - type: nauc_mrr_at_10_diff1
            value: 20.82028964049537
          - type: nauc_mrr_at_10_max
            value: 32.082751313883705
          - type: nauc_mrr_at_1_diff1
            value: 27.838566854973656
          - type: nauc_mrr_at_1_max
            value: 32.0217755083183
          - type: nauc_mrr_at_20_diff1
            value: 20.685607874578192
          - type: nauc_mrr_at_20_max
            value: 31.89765440964895
          - type: nauc_mrr_at_3_diff1
            value: 22.385335765437958
          - type: nauc_mrr_at_3_max
            value: 32.47346568889047
          - type: nauc_mrr_at_5_diff1
            value: 21.173253596770003
          - type: nauc_mrr_at_5_max
            value: 32.2528418460596
          - type: nauc_ndcg_at_1000_diff1
            value: 18.08460876388022
          - type: nauc_ndcg_at_1000_max
            value: 30.282810360048217
          - type: nauc_ndcg_at_100_diff1
            value: 17.119539175602068
          - type: nauc_ndcg_at_100_max
            value: 29.66409825853174
          - type: nauc_ndcg_at_10_diff1
            value: 18.23254548133648
          - type: nauc_ndcg_at_10_max
            value: 31.52995550586078
          - type: nauc_ndcg_at_1_diff1
            value: 27.838566854973656
          - type: nauc_ndcg_at_1_max
            value: 32.0217755083183
          - type: nauc_ndcg_at_20_diff1
            value: 17.769003159911446
          - type: nauc_ndcg_at_20_max
            value: 30.929703630445033
          - type: nauc_ndcg_at_3_diff1
            value: 20.96979719261237
          - type: nauc_ndcg_at_3_max
            value: 32.363993132409526
          - type: nauc_ndcg_at_5_diff1
            value: 19.00106027591966
          - type: nauc_ndcg_at_5_max
            value: 31.962682994281664
          - type: nauc_precision_at_1000_diff1
            value: -0.439767274118902
          - type: nauc_precision_at_1000_max
            value: 12.247737195943136
          - type: nauc_precision_at_100_diff1
            value: 5.574224743755663
          - type: nauc_precision_at_100_max
            value: 20.625486141114006
          - type: nauc_precision_at_10_diff1
            value: 12.116438700823444
          - type: nauc_precision_at_10_max
            value: 30.027073824365324
          - type: nauc_precision_at_1_diff1
            value: 27.838566854973656
          - type: nauc_precision_at_1_max
            value: 32.0217755083183
          - type: nauc_precision_at_20_diff1
            value: 10.528730914479825
          - type: nauc_precision_at_20_max
            value: 28.101643683820228
          - type: nauc_precision_at_3_diff1
            value: 17.575083081784413
          - type: nauc_precision_at_3_max
            value: 32.04257948042897
          - type: nauc_precision_at_5_diff1
            value: 13.87097676219356
          - type: nauc_precision_at_5_max
            value: 31.186621554981798
          - type: nauc_recall_at_1000_diff1
            value: -0.4397672741187951
          - type: nauc_recall_at_1000_max
            value: 12.247737195943454
          - type: nauc_recall_at_100_diff1
            value: 5.574224743755691
          - type: nauc_recall_at_100_max
            value: 20.625486141114028
          - type: nauc_recall_at_10_diff1
            value: 12.116438700823482
          - type: nauc_recall_at_10_max
            value: 30.027073824365335
          - type: nauc_recall_at_1_diff1
            value: 27.838566854973656
          - type: nauc_recall_at_1_max
            value: 32.0217755083183
          - type: nauc_recall_at_20_diff1
            value: 10.528730914479794
          - type: nauc_recall_at_20_max
            value: 28.101643683820228
          - type: nauc_recall_at_3_diff1
            value: 17.57508308178443
          - type: nauc_recall_at_3_max
            value: 32.042579480429
          - type: nauc_recall_at_5_diff1
            value: 13.870976762193543
          - type: nauc_recall_at_5_max
            value: 31.186621554981787
          - type: ndcg_at_1
            value: 13.677
          - type: ndcg_at_10
            value: 25.191000000000003
          - type: ndcg_at_100
            value: 30.379
          - type: ndcg_at_1000
            value: 33.961999999999996
          - type: ndcg_at_20
            value: 27.1
          - type: ndcg_at_3
            value: 20.546
          - type: ndcg_at_5
            value: 22.505
          - type: precision_at_1
            value: 13.677
          - type: precision_at_10
            value: 3.853
          - type: precision_at_100
            value: 0.639
          - type: precision_at_1000
            value: 0.093
          - type: precision_at_20
            value: 2.3009999999999997
          - type: precision_at_3
            value: 8.477
          - type: precision_at_5
            value: 6.0440000000000005
          - type: recall_at_1
            value: 13.677
          - type: recall_at_10
            value: 38.534
          - type: recall_at_100
            value: 63.922999999999995
          - type: recall_at_1000
            value: 93.407
          - type: recall_at_20
            value: 46.028000000000006
          - type: recall_at_3
            value: 25.430000000000003
          - type: recall_at_5
            value: 30.220999999999997
      - task:
          type: PairClassification
        dataset:
          type: GEM/opusparcus
          name: MTEB OpusparcusPC (fr)
          config: fr
          split: test
          revision: 9e9b1f8ef51616073f47f306f7f47dd91663f86a
        metrics:
          - type: cos_sim_accuracy
            value: 82.9700272479564
          - type: cos_sim_ap
            value: 93.15021785539084
          - type: cos_sim_f1
            value: 87.97316722568279
          - type: cos_sim_precision
            value: 85
          - type: cos_sim_recall
            value: 91.16186693147964
          - type: dot_accuracy
            value: 82.9700272479564
          - type: dot_ap
            value: 93.15021785539084
          - type: dot_f1
            value: 87.97316722568279
          - type: dot_precision
            value: 85
          - type: dot_recall
            value: 91.16186693147964
          - type: euclidean_accuracy
            value: 82.9700272479564
          - type: euclidean_ap
            value: 93.15015081441638
          - type: euclidean_f1
            value: 87.97316722568279
          - type: euclidean_precision
            value: 85
          - type: euclidean_recall
            value: 91.16186693147964
          - type: manhattan_accuracy
            value: 82.56130790190736
          - type: manhattan_ap
            value: 93.14590481820592
          - type: manhattan_f1
            value: 87.86729857819905
          - type: manhattan_precision
            value: 84.0435176790571
          - type: manhattan_recall
            value: 92.05561072492551
          - type: max_accuracy
            value: 82.9700272479564
          - type: max_ap
            value: 93.15021785539084
          - type: max_f1
            value: 87.97316722568279
      - task:
          type: PairClassification
        dataset:
          type: google-research-datasets/paws-x
          name: MTEB PawsX (fr)
          config: fr
          split: test
          revision: 8a04d940a42cd40658986fdd8e3da561533a3646
        metrics:
          - type: cos_sim_accuracy
            value: 64.14999999999999
          - type: cos_sim_ap
            value: 63.43794001840604
          - type: cos_sim_f1
            value: 62.59187620889749
          - type: cos_sim_precision
            value: 48.097502972651604
          - type: cos_sim_recall
            value: 89.59025470653378
          - type: dot_accuracy
            value: 64.14999999999999
          - type: dot_ap
            value: 63.52400235031554
          - type: dot_f1
            value: 62.59187620889749
          - type: dot_precision
            value: 48.097502972651604
          - type: dot_recall
            value: 89.59025470653378
          - type: euclidean_accuracy
            value: 64.14999999999999
          - type: euclidean_ap
            value: 63.43794001840604
          - type: euclidean_f1
            value: 62.59187620889749
          - type: euclidean_precision
            value: 48.097502972651604
          - type: euclidean_recall
            value: 89.59025470653378
          - type: manhattan_accuracy
            value: 64.2
          - type: manhattan_ap
            value: 63.46163243480347
          - type: manhattan_f1
            value: 62.540021344717175
          - type: manhattan_precision
            value: 46.069182389937104
          - type: manhattan_recall
            value: 97.34219269102991
          - type: max_accuracy
            value: 64.2
          - type: max_ap
            value: 63.52400235031554
          - type: max_f1
            value: 62.59187620889749
      - task:
          type: STS
        dataset:
          type: Lajavaness/SICK-fr
          name: MTEB SICKFr
          config: default
          split: test
          revision: e077ab4cf4774a1e36d86d593b150422fafd8e8a
        metrics:
          - type: cos_sim_pearson
            value: 85.12347242597652
          - type: cos_sim_spearman
            value: 79.80580538857501
          - type: euclidean_pearson
            value: 82.03127787921382
          - type: euclidean_spearman
            value: 79.80580538857501
          - type: manhattan_pearson
            value: 82.02795155003601
          - type: manhattan_spearman
            value: 79.7808784011127
      - task:
          type: STS
        dataset:
          type: mteb/sts22-crosslingual-sts
          name: MTEB STS22 (fr)
          config: fr
          split: test
          revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3
        metrics:
          - type: cos_sim_pearson
            value: 82.34462624659417
          - type: cos_sim_spearman
            value: 82.83867899462683
          - type: euclidean_pearson
            value: 80.00679113308384
          - type: euclidean_spearman
            value: 82.83867899462683
          - type: manhattan_pearson
            value: 79.97582730301362
          - type: manhattan_spearman
            value: 82.95718926500541
      - task:
          type: STS
        dataset:
          type: mteb/stsb_multi_mt
          name: MTEB STSBenchmarkMultilingualSTS (fr)
          config: fr
          split: test
          revision: 29afa2569dcedaaa2fe6a3dcfebab33d28b82e8c
        metrics:
          - type: cos_sim_pearson
            value: 86.0897698618904
          - type: cos_sim_spearman
            value: 86.58814894229229
          - type: euclidean_pearson
            value: 85.53992615842806
          - type: euclidean_spearman
            value: 86.58814894229229
          - type: manhattan_pearson
            value: 85.4985023034774
          - type: manhattan_spearman
            value: 86.50239881298486
      - task:
          type: Summarization
        dataset:
          type: lyon-nlp/summarization-summeval-fr-p2p
          name: MTEB SummEvalFr
          config: default
          split: test
          revision: b385812de6a9577b6f4d0f88c6a6e35395a94054
        metrics:
          - type: cos_sim_pearson
            value: 30.458145110977753
          - type: cos_sim_spearman
            value: 31.624715940109265
          - type: dot_pearson
            value: 30.458145236239915
          - type: dot_spearman
            value: 31.624715940109265
      - task:
          type: Reranking
        dataset:
          type: lyon-nlp/mteb-fr-reranking-syntec-s2p
          name: MTEB SyntecReranking
          config: default
          split: test
          revision: daf0863838cd9e3ba50544cdce3ac2b338a1b0ad
        metrics:
          - type: map
            value: 87.60277777777777
          - type: mrr
            value: 87.60277777777777
          - type: nAUC_map_diff1
            value: 63.877496103879814
          - type: nAUC_map_max
            value: -4.8943605546581725
          - type: nAUC_mrr_diff1
            value: 63.877496103879814
          - type: nAUC_mrr_max
            value: -4.8943605546581725
      - task:
          type: Retrieval
        dataset:
          type: lyon-nlp/mteb-fr-retrieval-syntec-s2p
          name: MTEB SyntecRetrieval
          config: default
          split: test
          revision: 19661ccdca4dfc2d15122d776b61685f48c68ca9
        metrics:
          - type: map_at_1
            value: 67
          - type: map_at_10
            value: 78.47800000000001
          - type: map_at_100
            value: 78.616
          - type: map_at_1000
            value: 78.616
          - type: map_at_20
            value: 78.52799999999999
          - type: map_at_3
            value: 77.833
          - type: map_at_5
            value: 78.033
          - type: mrr_at_1
            value: 67
          - type: mrr_at_10
            value: 78.47777777777777
          - type: mrr_at_100
            value: 78.61609758846066
          - type: mrr_at_1000
            value: 78.61609758846066
          - type: mrr_at_20
            value: 78.52777777777777
          - type: mrr_at_3
            value: 77.83333333333331
          - type: mrr_at_5
            value: 78.03333333333333
          - type: nauc_map_at_1000_diff1
            value: 54.76919250379753
          - type: nauc_map_at_1000_max
            value: 24.03294042759147
          - type: nauc_map_at_100_diff1
            value: 54.76919250379753
          - type: nauc_map_at_100_max
            value: 24.03294042759147
          - type: nauc_map_at_10_diff1
            value: 54.781660658782506
          - type: nauc_map_at_10_max
            value: 24.45332707633837
          - type: nauc_map_at_1_diff1
            value: 54.48189466912695
          - type: nauc_map_at_1_max
            value: 17.502666282947597
          - type: nauc_map_at_20_diff1
            value: 54.69518355408933
          - type: nauc_map_at_20_max
            value: 24.285263763068183
          - type: nauc_map_at_3_diff1
            value: 54.98928575752318
          - type: nauc_map_at_3_max
            value: 25.252117626643916
          - type: nauc_map_at_5_diff1
            value: 54.51750311747391
          - type: nauc_map_at_5_max
            value: 25.141479081321766
          - type: nauc_mrr_at_1000_diff1
            value: 54.76919250379753
          - type: nauc_mrr_at_1000_max
            value: 24.03294042759147
          - type: nauc_mrr_at_100_diff1
            value: 54.76919250379753
          - type: nauc_mrr_at_100_max
            value: 24.03294042759147
          - type: nauc_mrr_at_10_diff1
            value: 54.781660658782506
          - type: nauc_mrr_at_10_max
            value: 24.45332707633837
          - type: nauc_mrr_at_1_diff1
            value: 54.48189466912695
          - type: nauc_mrr_at_1_max
            value: 17.502666282947597
          - type: nauc_mrr_at_20_diff1
            value: 54.69518355408933
          - type: nauc_mrr_at_20_max
            value: 24.285263763068183
          - type: nauc_mrr_at_3_diff1
            value: 54.98928575752318
          - type: nauc_mrr_at_3_max
            value: 25.252117626643916
          - type: nauc_mrr_at_5_diff1
            value: 54.51750311747391
          - type: nauc_mrr_at_5_max
            value: 25.141479081321766
          - type: nauc_ndcg_at_1000_diff1
            value: 54.411394691026196
          - type: nauc_ndcg_at_1000_max
            value: 25.003182969921014
          - type: nauc_ndcg_at_100_diff1
            value: 54.411394691026196
          - type: nauc_ndcg_at_100_max
            value: 25.003182969921014
          - type: nauc_ndcg_at_10_diff1
            value: 53.97509194326736
          - type: nauc_ndcg_at_10_max
            value: 27.51736442048005
          - type: nauc_ndcg_at_1_diff1
            value: 54.48189466912695
          - type: nauc_ndcg_at_1_max
            value: 17.502666282947597
          - type: nauc_ndcg_at_20_diff1
            value: 53.46713794714154
          - type: nauc_ndcg_at_20_max
            value: 26.601577957753005
          - type: nauc_ndcg_at_3_diff1
            value: 54.521393171396525
          - type: nauc_ndcg_at_3_max
            value: 29.07380139412928
          - type: nauc_ndcg_at_5_diff1
            value: 53.42255297135452
          - type: nauc_ndcg_at_5_max
            value: 28.91110004742623
          - type: nauc_precision_at_1000_diff1
            value: nan
          - type: nauc_precision_at_1000_max
            value: nan
          - type: nauc_precision_at_100_diff1
            value: nan
          - type: nauc_precision_at_100_max
            value: nan
          - type: nauc_precision_at_10_diff1
            value: 41.59663865546228
          - type: nauc_precision_at_10_max
            value: 67.44864612511667
          - type: nauc_precision_at_1_diff1
            value: 54.48189466912695
          - type: nauc_precision_at_1_max
            value: 17.502666282947597
          - type: nauc_precision_at_20_diff1
            value: 26.486150015561265
          - type: nauc_precision_at_20_max
            value: 60.95549330843449
          - type: nauc_precision_at_3_diff1
            value: 50.78781512605074
          - type: nauc_precision_at_3_max
            value: 55.48552754435131
          - type: nauc_precision_at_5_diff1
            value: 43.75750300120062
          - type: nauc_precision_at_5_max
            value: 58.29665199413101
          - type: nauc_recall_at_1000_diff1
            value: nan
          - type: nauc_recall_at_1000_max
            value: nan
          - type: nauc_recall_at_100_diff1
            value: nan
          - type: nauc_recall_at_100_max
            value: nan
          - type: nauc_recall_at_10_diff1
            value: 41.59663865546242
          - type: nauc_recall_at_10_max
            value: 67.44864612511677
          - type: nauc_recall_at_1_diff1
            value: 54.48189466912695
          - type: nauc_recall_at_1_max
            value: 17.502666282947597
          - type: nauc_recall_at_20_diff1
            value: 26.486150015561737
          - type: nauc_recall_at_20_max
            value: 60.95549330843472
          - type: nauc_recall_at_3_diff1
            value: 50.787815126050376
          - type: nauc_recall_at_3_max
            value: 55.48552754435111
          - type: nauc_recall_at_5_diff1
            value: 43.75750300120054
          - type: nauc_recall_at_5_max
            value: 58.29665199413113
          - type: ndcg_at_1
            value: 67
          - type: ndcg_at_10
            value: 82.864
          - type: ndcg_at_100
            value: 83.672
          - type: ndcg_at_1000
            value: 83.672
          - type: ndcg_at_20
            value: 83.092
          - type: ndcg_at_3
            value: 81.464
          - type: ndcg_at_5
            value: 81.851
          - type: precision_at_1
            value: 67
          - type: precision_at_10
            value: 9.6
          - type: precision_at_100
            value: 1
          - type: precision_at_1000
            value: 0.1
          - type: precision_at_20
            value: 4.8500000000000005
          - type: precision_at_3
            value: 30.667
          - type: precision_at_5
            value: 18.6
          - type: recall_at_1
            value: 67
          - type: recall_at_10
            value: 96
          - type: recall_at_100
            value: 100
          - type: recall_at_1000
            value: 100
          - type: recall_at_20
            value: 97
          - type: recall_at_3
            value: 92
          - type: recall_at_5
            value: 93
      - task:
          type: Retrieval
        dataset:
          type: jinaai/xpqa
          name: MTEB XPQARetrieval (fr)
          config: fr
          split: test
          revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f
        metrics:
          - type: map_at_1
            value: 40.038000000000004
          - type: map_at_10
            value: 62.409000000000006
          - type: map_at_100
            value: 63.63999999999999
          - type: map_at_1000
            value: 63.693
          - type: map_at_20
            value: 63.165000000000006
          - type: map_at_3
            value: 55.364999999999995
          - type: map_at_5
            value: 59.95399999999999
          - type: mrr_at_1
            value: 62.88384512683578
          - type: mrr_at_10
            value: 70.414944794117
          - type: mrr_at_100
            value: 70.85679259651413
          - type: mrr_at_1000
            value: 70.8680806615119
          - type: mrr_at_20
            value: 70.69824986774621
          - type: mrr_at_3
            value: 68.04628393413438
          - type: mrr_at_5
            value: 69.65509568313303
          - type: nauc_map_at_1000_diff1
            value: 47.58306966781138
          - type: nauc_map_at_1000_max
            value: 49.99853404950863
          - type: nauc_map_at_100_diff1
            value: 47.5473544905194
          - type: nauc_map_at_100_max
            value: 49.98683021023155
          - type: nauc_map_at_10_diff1
            value: 47.443327641163705
          - type: nauc_map_at_10_max
            value: 49.31862257934493
          - type: nauc_map_at_1_diff1
            value: 55.93203426614159
          - type: nauc_map_at_1_max
            value: 27.467436111704224
          - type: nauc_map_at_20_diff1
            value: 47.454162467793985
          - type: nauc_map_at_20_max
            value: 49.715459382963765
          - type: nauc_map_at_3_diff1
            value: 48.910525378486874
          - type: nauc_map_at_3_max
            value: 42.13319318718595
          - type: nauc_map_at_5_diff1
            value: 48.56545403298638
          - type: nauc_map_at_5_max
            value: 47.311811085622445
          - type: nauc_mrr_at_1000_diff1
            value: 56.739822956274224
          - type: nauc_mrr_at_1000_max
            value: 58.274212468278854
          - type: nauc_mrr_at_100_diff1
            value: 56.7308210328899
          - type: nauc_mrr_at_100_max
            value: 58.27250671019899
          - type: nauc_mrr_at_10_diff1
            value: 56.647228471816405
          - type: nauc_mrr_at_10_max
            value: 58.210342990657495
          - type: nauc_mrr_at_1_diff1
            value: 58.618266167104046
          - type: nauc_mrr_at_1_max
            value: 58.55438607166539
          - type: nauc_mrr_at_20_diff1
            value: 56.63534799976597
          - type: nauc_mrr_at_20_max
            value: 58.17181317797869
          - type: nauc_mrr_at_3_diff1
            value: 56.815531582264825
          - type: nauc_mrr_at_3_max
            value: 58.32821204695344
          - type: nauc_mrr_at_5_diff1
            value: 56.79122022985127
          - type: nauc_mrr_at_5_max
            value: 58.20366609452701
          - type: nauc_ndcg_at_1000_diff1
            value: 49.530062263932194
          - type: nauc_ndcg_at_1000_max
            value: 53.473298956705925
          - type: nauc_ndcg_at_100_diff1
            value: 48.95703823297219
          - type: nauc_ndcg_at_100_max
            value: 53.191721124797276
          - type: nauc_ndcg_at_10_diff1
            value: 47.98530786084638
          - type: nauc_ndcg_at_10_max
            value: 51.155857323188016
          - type: nauc_ndcg_at_1_diff1
            value: 58.618266167104046
          - type: nauc_ndcg_at_1_max
            value: 58.55438607166539
          - type: nauc_ndcg_at_20_diff1
            value: 47.95544792051313
          - type: nauc_ndcg_at_20_max
            value: 51.751640167194054
          - type: nauc_ndcg_at_3_diff1
            value: 48.50900656884395
          - type: nauc_ndcg_at_3_max
            value: 50.78667595293348
          - type: nauc_ndcg_at_5_diff1
            value: 49.496100926859654
          - type: nauc_ndcg_at_5_max
            value: 49.089893886856835
          - type: nauc_precision_at_1000_diff1
            value: -19.085707327488784
          - type: nauc_precision_at_1000_max
            value: 22.16522736611267
          - type: nauc_precision_at_100_diff1
            value: -16.92930793417545
          - type: nauc_precision_at_100_max
            value: 26.119556898620655
          - type: nauc_precision_at_10_diff1
            value: -8.586758571265364
          - type: nauc_precision_at_10_max
            value: 34.29909350105018
          - type: nauc_precision_at_1_diff1
            value: 58.618266167104046
          - type: nauc_precision_at_1_max
            value: 58.55438607166539
          - type: nauc_precision_at_20_diff1
            value: -12.36545815755639
          - type: nauc_precision_at_20_max
            value: 30.779202784243694
          - type: nauc_precision_at_3_diff1
            value: 7.173290556095678
          - type: nauc_precision_at_3_max
            value: 43.244915594569356
          - type: nauc_precision_at_5_diff1
            value: -0.5308831428158323
          - type: nauc_precision_at_5_max
            value: 39.78478615216909
          - type: nauc_recall_at_1000_diff1
            value: 44.67738158424653
          - type: nauc_recall_at_1000_max
            value: 71.12276250795361
          - type: nauc_recall_at_100_diff1
            value: 30.071917991701135
          - type: nauc_recall_at_100_max
            value: 42.226214389979326
          - type: nauc_recall_at_10_diff1
            value: 36.275167481806804
          - type: nauc_recall_at_10_max
            value: 40.16796727800884
          - type: nauc_recall_at_1_diff1
            value: 55.93203426614159
          - type: nauc_recall_at_1_max
            value: 27.467436111704224
          - type: nauc_recall_at_20_diff1
            value: 32.189427460851505
          - type: nauc_recall_at_20_max
            value: 38.926081167758205
          - type: nauc_recall_at_3_diff1
            value: 43.959378195689894
          - type: nauc_recall_at_3_max
            value: 36.441633750156335
          - type: nauc_recall_at_5_diff1
            value: 42.6274479464408
          - type: nauc_recall_at_5_max
            value: 38.9902118898862
          - type: ndcg_at_1
            value: 62.88399999999999
          - type: ndcg_at_10
            value: 68.907
          - type: ndcg_at_100
            value: 72.896
          - type: ndcg_at_1000
            value: 73.721
          - type: ndcg_at_20
            value: 70.738
          - type: ndcg_at_3
            value: 62.731
          - type: ndcg_at_5
            value: 65.191
          - type: precision_at_1
            value: 62.88399999999999
          - type: precision_at_10
            value: 16.101
          - type: precision_at_100
            value: 1.951
          - type: precision_at_1000
            value: 0.20600000000000002
          - type: precision_at_20
            value: 8.705
          - type: precision_at_3
            value: 38.095
          - type: precision_at_5
            value: 27.904
          - type: recall_at_1
            value: 40.038000000000004
          - type: recall_at_10
            value: 79.237
          - type: recall_at_100
            value: 94.17699999999999
          - type: recall_at_1000
            value: 99.466
          - type: recall_at_20
            value: 85.027
          - type: recall_at_3
            value: 60.336
          - type: recall_at_5
            value: 70.122
license: apache-2.0
language:
  - fr
metrics:
  - pearsonr
  - spearmanr

bilingual-document-embedding

bilingual-document-embedding is the Embedding Model for document in bilingual language: french and english with context length up to 8096 tokens . This model is a specialized sentence-embedding trained specifically for the bilingual language, leveraging the robust capabilities of BGE M3, a pre-trained language model larged on the BGE M3 architecture. The model utilizes xlm-roberta to encode english-french sentences into a 1024-dimensional vector space, facilitating a wide range of applications from semantic search to text clustering. The embeddings capture the nuanced meanings of english-french sentences, reflecting both the lexical and contextual layers of the language.

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BilingualModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Training and Fine-tuning process

Stage 1: NLI Training

  • Dataset: [(SNLI+XNLI) for english+french]
  • Method: Training using Multi-Negative Ranking Loss. This stage focused on improving the model's ability to discern and rank nuanced differences in sentence semantics.

Stage 3: Continued Fine-tuning for Semantic Textual Similarity on STS Benchmark

  • Dataset: [STSB-fr and en]
  • Method: Fine-tuning specifically for the semantic textual similarity benchmark using Siamese BERT-Networks configured with the 'sentence-transformers' library.

Stage 4: Advanced Augmentation Fine-tuning

  • Dataset: STSB with generate silver sample from gold sample
  • Method: Employed an advanced strategy using Augmented SBERT with Pair Sampling Strategies, integrating both Cross-Encoder and Bi-Encoder models. This stage further refined the embeddings by enriching the training data dynamically, enhancing the model's robustness and accuracy.

Usage:

Using this model becomes easy when you have sentence-transformers installed:

pip install -U sentence-transformers

Then you can use the model like this:

from sentence_transformers import SentenceTransformer

sentences = ["Paris est une capitale de la France", "Paris is a capital of France"]

model = SentenceTransformer('Lajavaness/bilingual-document-embedding', trust_remote_code=True)
print(embeddings)

Evaluation

TODO

Citation

@article{chen2024bge,
  title={Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation},
  author={Chen, Jianlv and Xiao, Shitao and Zhang, Peitian and Luo, Kun and Lian, Defu and Liu, Zheng},
  journal={arXiv preprint arXiv:2402.03216},
  year={2024}
}

@article{conneau2019unsupervised,
  title={Unsupervised cross-lingual representation learning at scale},
  author={Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm{\'a}n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin},
  journal={arXiv preprint arXiv:1911.02116},
  year={2019}
}

@article{reimers2019sentence,
   title={Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks},
   author={Nils Reimers, Iryna Gurevych},
   journal={https://arxiv.org/abs/1908.10084},
   year={2019}
}

@article{thakur2020augmented,
  title={Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks},
  author={Thakur, Nandan and Reimers, Nils and Daxenberger, Johannes and Gurevych, Iryna},
  journal={arXiv e-prints},
  pages={arXiv--2010},
  year={2020}