eagle0504's picture
Add new SentenceTransformer model.
00b880a verified
metadata
language:
  - en
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dataset_size:10K<n<100K
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
base_model: distilbert/distilroberta-base
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
widget:
  - source_sentence: I hate to.
    sentences:
      - um well i hate to yes i do
      - The dogs are sleeping.
      - The child is running
  - source_sentence: He gave in.
    sentences:
      - Then he gave in.
      - Two men stand outdoors.
      - A girl is outdoors.
  - source_sentence: um pardon me
    sentences:
      - Excuse me.
      - The man has shorts on.
      - The child is running
  - source_sentence: He shrugged.
    sentences:
      - Then he shrugged.
      - A woman pulls a child.
      - There are three workers
  - source_sentence: Loire Valley
    sentences:
      - The Valley of Loire.
      - The people are adults.
      - There are three workers
pipeline_tag: sentence-similarity
model-index:
  - name: SentenceTransformer based on distilbert/distilroberta-base
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev 768
          type: sts-dev-768
        metrics:
          - type: pearson_cosine
            value: 0.8170160257039426
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8255975460077839
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8085164465261383
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8093268377073419
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8090075094521302
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8098830595985685
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.5787845464363458
            name: Pearson Dot
          - type: spearman_dot
            value: 0.6050816410949579
            name: Spearman Dot
          - type: pearson_max
            value: 0.8170160257039426
            name: Pearson Max
          - type: spearman_max
            value: 0.8255975460077839
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev 512
          type: sts-dev-512
        metrics:
          - type: pearson_cosine
            value: 0.8217137626363546
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8277309683690247
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8085782690288572
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8090394046170676
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8096391371860112
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8097335476296557
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.6360180046118354
            name: Pearson Dot
          - type: spearman_dot
            value: 0.6594419705525
            name: Spearman Dot
          - type: pearson_max
            value: 0.8217137626363546
            name: Pearson Max
          - type: spearman_max
            value: 0.8277309683690247
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev 256
          type: sts-dev-256
        metrics:
          - type: pearson_cosine
            value: 0.8178918632167268
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8257042835854954
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8072136828107425
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8080863714656285
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8075560498740989
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8082369652944293
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.6315912070915749
            name: Pearson Dot
          - type: spearman_dot
            value: 0.6530916665342353
            name: Spearman Dot
          - type: pearson_max
            value: 0.8178918632167268
            name: Pearson Max
          - type: spearman_max
            value: 0.8257042835854954
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev 128
          type: sts-dev-128
        metrics:
          - type: pearson_cosine
            value: 0.8083653645114982
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8195265949387541
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8017122988316997
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8044101635226892
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8004513898889509
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8032518581337287
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.6054920626583514
            name: Pearson Dot
          - type: spearman_dot
            value: 0.632383085342124
            name: Spearman Dot
          - type: pearson_max
            value: 0.8083653645114982
            name: Pearson Max
          - type: spearman_max
            value: 0.8195265949387541
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev 64
          type: sts-dev-64
        metrics:
          - type: pearson_cosine
            value: 0.796901729599783
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8138253855008615
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7907415690444286
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7953162414756638
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7886601701724113
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7935316804874565
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.5369879256914492
            name: Pearson Dot
          - type: spearman_dot
            value: 0.5528888392757544
            name: Spearman Dot
          - type: pearson_max
            value: 0.796901729599783
            name: Pearson Max
          - type: spearman_max
            value: 0.8138253855008615
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test 768
          type: sts-test-768
        metrics:
          - type: pearson_cosine
            value: 0.7830303669378891
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7773625997426432
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7760379804905847
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7571500188418279
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.776793987384272
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7576769993000992
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.5696917713192656
            name: Pearson Dot
          - type: spearman_dot
            value: 0.5537799075128554
            name: Spearman Dot
          - type: pearson_max
            value: 0.7830303669378891
            name: Pearson Max
          - type: spearman_max
            value: 0.7773625997426432
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test 512
          type: sts-test-512
        metrics:
          - type: pearson_cosine
            value: 0.7907973459486692
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7782644020369065
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7764511872615909
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7566408579053339
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7782953766500842
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7586949913092054
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.6258186284156914
            name: Pearson Dot
          - type: spearman_dot
            value: 0.6181773438089058
            name: Spearman Dot
          - type: pearson_max
            value: 0.7907973459486692
            name: Pearson Max
          - type: spearman_max
            value: 0.7782644020369065
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test 256
          type: sts-test-256
        metrics:
          - type: pearson_cosine
            value: 0.790808685432137
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7795179076946794
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7755456692602101
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7562532907515199
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7781380668198093
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7588258189898521
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.6154250444753552
            name: Pearson Dot
          - type: spearman_dot
            value: 0.6086811490786919
            name: Spearman Dot
          - type: pearson_max
            value: 0.790808685432137
            name: Pearson Max
          - type: spearman_max
            value: 0.7795179076946794
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test 128
          type: sts-test-128
        metrics:
          - type: pearson_cosine
            value: 0.7834894934855864
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.774163149418093
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7698790484447937
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7517016204558167
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7725214617886543
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7540743359385147
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.5993484722642376
            name: Pearson Dot
          - type: spearman_dot
            value: 0.5948446351721051
            name: Spearman Dot
          - type: pearson_max
            value: 0.7834894934855864
            name: Pearson Max
          - type: spearman_max
            value: 0.774163149418093
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test 64
          type: sts-test-64
        metrics:
          - type: pearson_cosine
            value: 0.76979297806493
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7694166074528175
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7566840706659232
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7416860573345465
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7597470663104835
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.744530141432067
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.5373925875692315
            name: Pearson Dot
          - type: spearman_dot
            value: 0.5338006780754848
            name: Spearman Dot
          - type: pearson_max
            value: 0.76979297806493
            name: Pearson Max
          - type: spearman_max
            value: 0.7694166074528175
            name: Spearman Max

SentenceTransformer based on distilbert/distilroberta-base

This is a sentence-transformers model finetuned from distilbert/distilroberta-base on the sentence-transformers/all-nli dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("eagle0504/distilroberta-base-nli-matryoshka")
# Run inference
sentences = [
    'Loire Valley',
    'The Valley of Loire.',
    'The people are adults.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.817
spearman_cosine 0.8256
pearson_manhattan 0.8085
spearman_manhattan 0.8093
pearson_euclidean 0.809
spearman_euclidean 0.8099
pearson_dot 0.5788
spearman_dot 0.6051
pearson_max 0.817
spearman_max 0.8256

Semantic Similarity

Metric Value
pearson_cosine 0.8217
spearman_cosine 0.8277
pearson_manhattan 0.8086
spearman_manhattan 0.809
pearson_euclidean 0.8096
spearman_euclidean 0.8097
pearson_dot 0.636
spearman_dot 0.6594
pearson_max 0.8217
spearman_max 0.8277

Semantic Similarity

Metric Value
pearson_cosine 0.8179
spearman_cosine 0.8257
pearson_manhattan 0.8072
spearman_manhattan 0.8081
pearson_euclidean 0.8076
spearman_euclidean 0.8082
pearson_dot 0.6316
spearman_dot 0.6531
pearson_max 0.8179
spearman_max 0.8257

Semantic Similarity

Metric Value
pearson_cosine 0.8084
spearman_cosine 0.8195
pearson_manhattan 0.8017
spearman_manhattan 0.8044
pearson_euclidean 0.8005
spearman_euclidean 0.8033
pearson_dot 0.6055
spearman_dot 0.6324
pearson_max 0.8084
spearman_max 0.8195

Semantic Similarity

Metric Value
pearson_cosine 0.7969
spearman_cosine 0.8138
pearson_manhattan 0.7907
spearman_manhattan 0.7953
pearson_euclidean 0.7887
spearman_euclidean 0.7935
pearson_dot 0.537
spearman_dot 0.5529
pearson_max 0.7969
spearman_max 0.8138

Semantic Similarity

Metric Value
pearson_cosine 0.783
spearman_cosine 0.7774
pearson_manhattan 0.776
spearman_manhattan 0.7572
pearson_euclidean 0.7768
spearman_euclidean 0.7577
pearson_dot 0.5697
spearman_dot 0.5538
pearson_max 0.783
spearman_max 0.7774

Semantic Similarity

Metric Value
pearson_cosine 0.7908
spearman_cosine 0.7783
pearson_manhattan 0.7765
spearman_manhattan 0.7566
pearson_euclidean 0.7783
spearman_euclidean 0.7587
pearson_dot 0.6258
spearman_dot 0.6182
pearson_max 0.7908
spearman_max 0.7783

Semantic Similarity

Metric Value
pearson_cosine 0.7908
spearman_cosine 0.7795
pearson_manhattan 0.7755
spearman_manhattan 0.7563
pearson_euclidean 0.7781
spearman_euclidean 0.7588
pearson_dot 0.6154
spearman_dot 0.6087
pearson_max 0.7908
spearman_max 0.7795

Semantic Similarity

Metric Value
pearson_cosine 0.7835
spearman_cosine 0.7742
pearson_manhattan 0.7699
spearman_manhattan 0.7517
pearson_euclidean 0.7725
spearman_euclidean 0.7541
pearson_dot 0.5993
spearman_dot 0.5948
pearson_max 0.7835
spearman_max 0.7742

Semantic Similarity

Metric Value
pearson_cosine 0.7698
spearman_cosine 0.7694
pearson_manhattan 0.7567
spearman_manhattan 0.7417
pearson_euclidean 0.7597
spearman_euclidean 0.7445
pearson_dot 0.5374
spearman_dot 0.5338
pearson_max 0.7698
spearman_max 0.7694

Training Details

Training Dataset

sentence-transformers/all-nli

  • Dataset: sentence-transformers/all-nli at d482672
  • Size: 10,000 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 4 tokens
    • mean: 18.26 tokens
    • max: 88 tokens
    • min: 4 tokens
    • mean: 11.6 tokens
    • max: 36 tokens
    • min: 4 tokens
    • mean: 12.09 tokens
    • max: 38 tokens
  • Samples:
    anchor positive negative
    Side view of a female triathlete during the run. A woman runs A man sits
    Confused person standing in the middle of the trolley tracks trying to figure out the signs. A person is on the tracks. A man sits in an airplane.
    A woman in a black shirt, jean shorts and white tennis shoes is bowling. A woman is bowling in casual clothes A woman bowling wins an outfit of clothes
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Evaluation Dataset

sentence-transformers/all-nli

  • Dataset: sentence-transformers/all-nli at d482672
  • Size: 6,584 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 6 tokens
    • mean: 18.02 tokens
    • max: 66 tokens
    • min: 5 tokens
    • mean: 9.81 tokens
    • max: 29 tokens
    • min: 5 tokens
    • mean: 10.37 tokens
    • max: 29 tokens
  • Samples:
    anchor positive negative
    Two women are embracing while holding to go packages. Two woman are holding packages. The men are fighting outside a deli.
    Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink. Two kids in numbered jerseys wash their hands. Two kids in jackets walk to school.
    A man selling donuts to a customer during a world exhibition event held in the city of Angeles A man selling donuts to a customer. A woman drinks her coffee in a small cafe.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss sts-dev-128_spearman_cosine sts-dev-256_spearman_cosine sts-dev-512_spearman_cosine sts-dev-64_spearman_cosine sts-dev-768_spearman_cosine sts-test-128_spearman_cosine sts-test-256_spearman_cosine sts-test-512_spearman_cosine sts-test-64_spearman_cosine sts-test-768_spearman_cosine
0.3797 30 15.8875 6.1089 0.8036 0.8123 0.8143 0.8010 0.8076 - - - - -
0.7595 60 7.4874 5.0189 0.8195 0.8257 0.8277 0.8138 0.8256 - - - - -
1.0 79 - - - - - - - 0.7742 0.7795 0.7783 0.7694 0.7774

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.0
  • Transformers: 4.41.1
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.30.1
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}