tomaarsen's picture
tomaarsen HF staff
Add new SentenceTransformer model.
282807e verified
metadata
language:
  - en
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
base_model: distilbert/distilroberta-base
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
widget:
  - source_sentence: A baby is laughing.
    sentences:
      - The baby laughed in his car seat.
      - A toddler walks down a hallway.
      - Japan falls silent to mark 311 tragedy
  - source_sentence: A woman is reading.
    sentences:
      - A woman is writing something.
      - The man is in a deserted field.
      - Obama urges no new sanctions on Iran
  - source_sentence: A man is spitting.
    sentences:
      - A man is crying.
      - A girl plays a wind instrument.
      - Kids playing ball in the park.
  - source_sentence: A man shoots a man.
    sentences:
      - A man is shooting off guns.
      - A slow loris hanging on a cord.
      - Finance minister promises no new taxes
  - source_sentence: A boy is vacuuming.
    sentences:
      - A little boy is vacuuming the floor.
      - A woman is applying eye shadow.
      - Glorious triple-gold night for Britain
pipeline_tag: sentence-similarity
co2_eq_emissions:
  emissions: 94.71657156591533
  energy_consumed: 0.2436740010751561
  source: codecarbon
  training_type: fine-tuning
  on_cloud: false
  cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
  ram_total_size: 31.777088165283203
  hours_used: 0.923
  hardware_used: 1 x NVIDIA GeForce RTX 3090
model-index:
  - name: SentenceTransformer based on distilbert/distilroberta-base
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev 256
          type: sts-dev-256
        metrics:
          - type: pearson_cosine
            value: 0.832978199459682
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8449812730792539
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8284059469034439
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8314151253676515
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8291459460248565
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8319080532683886
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.7274279213358037
            name: Pearson Dot
          - type: spearman_dot
            value: 0.7358272455513368
            name: Spearman Dot
          - type: pearson_max
            value: 0.832978199459682
            name: Pearson Max
          - type: spearman_max
            value: 0.8449812730792539
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev 128
          type: sts-dev-128
        metrics:
          - type: pearson_cosine
            value: 0.8266436609310417
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.841563547795295
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8250171666597236
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8276544602820737
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8255984422889996
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.828520082690129
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.7120095981036954
            name: Pearson Dot
          - type: spearman_dot
            value: 0.7163267085950832
            name: Spearman Dot
          - type: pearson_max
            value: 0.8266436609310417
            name: Pearson Max
          - type: spearman_max
            value: 0.841563547795295
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev 64
          type: sts-dev-64
        metrics:
          - type: pearson_cosine
            value: 0.817074395539638
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8355573303767316
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8175610864074738
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8212543828500742
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8175058817585
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8216438541895171
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.6852246329807953
            name: Pearson Dot
          - type: spearman_dot
            value: 0.6861394760239012
            name: Spearman Dot
          - type: pearson_max
            value: 0.8175610864074738
            name: Pearson Max
          - type: spearman_max
            value: 0.8355573303767316
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev 32
          type: sts-dev-32
        metrics:
          - type: pearson_cosine
            value: 0.7963856490231295
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8243820415687734
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7982768947167747
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.804919985023919
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.800259304954162
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8069660671225415
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.6311831976256888
            name: Pearson Dot
          - type: spearman_dot
            value: 0.6277202377535699
            name: Spearman Dot
          - type: pearson_max
            value: 0.800259304954162
            name: Pearson Max
          - type: spearman_max
            value: 0.8243820415687734
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev 16
          type: sts-dev-16
        metrics:
          - type: pearson_cosine
            value: 0.7401161630034654
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7871969780219474
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7609788932639057
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7761115272699121
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7645256699036285
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7794348361665424
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.5201701018366058
            name: Pearson Dot
          - type: spearman_dot
            value: 0.511537896780009
            name: Spearman Dot
          - type: pearson_max
            value: 0.7645256699036285
            name: Pearson Max
          - type: spearman_max
            value: 0.7871969780219474
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test 256
          type: sts-test-256
        metrics:
          - type: pearson_cosine
            value: 0.8124139776213125
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8211087618006394
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7835377144525455
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7821679937822867
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.785247473429926
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7839505779526579
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.5917356859640799
            name: Pearson Dot
          - type: spearman_dot
            value: 0.5785063907246168
            name: Spearman Dot
          - type: pearson_max
            value: 0.8124139776213125
            name: Pearson Max
          - type: spearman_max
            value: 0.8211087618006394
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test 128
          type: sts-test-128
        metrics:
          - type: pearson_cosine
            value: 0.8079155052116238
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8190362316108264
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7794841536695422
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7786315620445202
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.781284034387115
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7812532216784576
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.5714349767115854
            name: Pearson Dot
          - type: spearman_dot
            value: 0.5601824337480018
            name: Spearman Dot
          - type: pearson_max
            value: 0.8079155052116238
            name: Pearson Max
          - type: spearman_max
            value: 0.8190362316108264
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test 64
          type: sts-test-64
        metrics:
          - type: pearson_cosine
            value: 0.7987987273687178
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8128864395227673
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7727564778562619
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7727917251788465
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7734618345058613
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7751195654319647
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.5397052344713898
            name: Pearson Dot
          - type: spearman_dot
            value: 0.5279010425382445
            name: Spearman Dot
          - type: pearson_max
            value: 0.7987987273687178
            name: Pearson Max
          - type: spearman_max
            value: 0.8128864395227673
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test 32
          type: sts-test-32
        metrics:
          - type: pearson_cosine
            value: 0.7720012222035324
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7936423982593883
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7561303110063385
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7597271202292094
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7580804607973455
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7628041180101269
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.48898156184384284
            name: Pearson Dot
          - type: spearman_dot
            value: 0.47793665423562026
            name: Spearman Dot
          - type: pearson_max
            value: 0.7720012222035324
            name: Pearson Max
          - type: spearman_max
            value: 0.7936423982593883
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test 16
          type: sts-test-16
        metrics:
          - type: pearson_cosine
            value: 0.7137967594997888
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7485767932719462
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7254358927069169
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7339448581065434
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7274341928076351
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7382083636772965
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.385573703763858
            name: Pearson Dot
          - type: spearman_dot
            value: 0.3749226996833225
            name: Spearman Dot
          - type: pearson_max
            value: 0.7274341928076351
            name: Pearson Max
          - type: spearman_max
            value: 0.7485767932719462
            name: Spearman Max

SentenceTransformer based on distilbert/distilroberta-base

This is a sentence-transformers model finetuned from distilbert/distilroberta-base on the sentence-transformers/all-nli dataset. It maps sentences & paragraphs to a 256-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (reduced_dim): Dense({'in_features': 768, 'out_features': 256, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/distilroberta-base-nli-matryoshka-reduced")
# Run inference
sentences = [
    'A boy is vacuuming.',
    'A little boy is vacuuming the floor.',
    'A woman is applying eye shadow.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 256]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.833
spearman_cosine 0.845
pearson_manhattan 0.8284
spearman_manhattan 0.8314
pearson_euclidean 0.8291
spearman_euclidean 0.8319
pearson_dot 0.7274
spearman_dot 0.7358
pearson_max 0.833
spearman_max 0.845

Semantic Similarity

Metric Value
pearson_cosine 0.8266
spearman_cosine 0.8416
pearson_manhattan 0.825
spearman_manhattan 0.8277
pearson_euclidean 0.8256
spearman_euclidean 0.8285
pearson_dot 0.712
spearman_dot 0.7163
pearson_max 0.8266
spearman_max 0.8416

Semantic Similarity

Metric Value
pearson_cosine 0.8171
spearman_cosine 0.8356
pearson_manhattan 0.8176
spearman_manhattan 0.8213
pearson_euclidean 0.8175
spearman_euclidean 0.8216
pearson_dot 0.6852
spearman_dot 0.6861
pearson_max 0.8176
spearman_max 0.8356

Semantic Similarity

Metric Value
pearson_cosine 0.7964
spearman_cosine 0.8244
pearson_manhattan 0.7983
spearman_manhattan 0.8049
pearson_euclidean 0.8003
spearman_euclidean 0.807
pearson_dot 0.6312
spearman_dot 0.6277
pearson_max 0.8003
spearman_max 0.8244

Semantic Similarity

Metric Value
pearson_cosine 0.7401
spearman_cosine 0.7872
pearson_manhattan 0.761
spearman_manhattan 0.7761
pearson_euclidean 0.7645
spearman_euclidean 0.7794
pearson_dot 0.5202
spearman_dot 0.5115
pearson_max 0.7645
spearman_max 0.7872

Semantic Similarity

Metric Value
pearson_cosine 0.8124
spearman_cosine 0.8211
pearson_manhattan 0.7835
spearman_manhattan 0.7822
pearson_euclidean 0.7852
spearman_euclidean 0.784
pearson_dot 0.5917
spearman_dot 0.5785
pearson_max 0.8124
spearman_max 0.8211

Semantic Similarity

Metric Value
pearson_cosine 0.8079
spearman_cosine 0.819
pearson_manhattan 0.7795
spearman_manhattan 0.7786
pearson_euclidean 0.7813
spearman_euclidean 0.7813
pearson_dot 0.5714
spearman_dot 0.5602
pearson_max 0.8079
spearman_max 0.819

Semantic Similarity

Metric Value
pearson_cosine 0.7988
spearman_cosine 0.8129
pearson_manhattan 0.7728
spearman_manhattan 0.7728
pearson_euclidean 0.7735
spearman_euclidean 0.7751
pearson_dot 0.5397
spearman_dot 0.5279
pearson_max 0.7988
spearman_max 0.8129

Semantic Similarity

Metric Value
pearson_cosine 0.772
spearman_cosine 0.7936
pearson_manhattan 0.7561
spearman_manhattan 0.7597
pearson_euclidean 0.7581
spearman_euclidean 0.7628
pearson_dot 0.489
spearman_dot 0.4779
pearson_max 0.772
spearman_max 0.7936

Semantic Similarity

Metric Value
pearson_cosine 0.7138
spearman_cosine 0.7486
pearson_manhattan 0.7254
spearman_manhattan 0.7339
pearson_euclidean 0.7274
spearman_euclidean 0.7382
pearson_dot 0.3856
spearman_dot 0.3749
pearson_max 0.7274
spearman_max 0.7486

Training Details

Training Dataset

sentence-transformers/all-nli

  • Dataset: sentence-transformers/all-nli at 65dd388
  • Size: 557,850 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 7 tokens
    • mean: 10.38 tokens
    • max: 45 tokens
    • min: 6 tokens
    • mean: 12.8 tokens
    • max: 39 tokens
    • min: 6 tokens
    • mean: 13.4 tokens
    • max: 50 tokens
  • Samples:
    anchor positive negative
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse. A person is at a diner, ordering an omelette.
    Children smiling and waving at camera There are children present The kids are frowning
    A boy is jumping on skateboard in the middle of a red bridge. The boy does a skateboarding trick. The boy skates down the sidewalk.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            256,
            128,
            64,
            32,
            16
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Evaluation Dataset

sentence-transformers/stsb

  • Dataset: sentence-transformers/stsb at ab7a5ac
  • Size: 1,500 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 5 tokens
    • mean: 15.0 tokens
    • max: 44 tokens
    • min: 6 tokens
    • mean: 14.99 tokens
    • max: 61 tokens
    • min: 0.0
    • mean: 0.47
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A man with a hard hat is dancing. A man wearing a hard hat is dancing. 1.0
    A young child is riding a horse. A child is riding a horse. 0.95
    A man is feeding a mouse to a snake. The man is feeding a mouse to the snake. 1.0
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            256,
            128,
            64,
            32,
            16
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: False
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: None
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss sts-dev-128_spearman_cosine sts-dev-16_spearman_cosine sts-dev-256_spearman_cosine sts-dev-32_spearman_cosine sts-dev-64_spearman_cosine sts-test-128_spearman_cosine sts-test-16_spearman_cosine sts-test-256_spearman_cosine sts-test-32_spearman_cosine sts-test-64_spearman_cosine
0.0229 100 21.0363 14.2448 0.7856 0.7417 0.7873 0.7751 0.7846 - - - - -
0.0459 200 11.1093 13.4736 0.7877 0.7298 0.7861 0.7687 0.7798 - - - - -
0.0688 300 10.1847 13.7191 0.7877 0.7284 0.7898 0.7617 0.7755 - - - - -
0.0918 400 9.356 13.2955 0.7906 0.7385 0.7914 0.7715 0.7799 - - - - -
0.1147 500 8.9318 12.8099 0.7889 0.7346 0.7910 0.7690 0.7801 - - - - -
0.1376 600 8.5293 13.7384 0.7814 0.7362 0.7866 0.7656 0.7736 - - - - -
0.1606 700 8.7589 13.4466 0.7899 0.7467 0.7945 0.7770 0.7847 - - - - -
0.1835 800 7.7941 13.6734 0.7960 0.7526 0.7986 0.7800 0.7894 - - - - -
0.2065 900 7.9183 12.9082 0.7885 0.7470 0.7966 0.7705 0.7803 - - - - -
0.2294 1000 7.3669 13.2827 0.7751 0.7181 0.7822 0.7557 0.7675 - - - - -
0.2524 1100 7.6205 13.0227 0.7875 0.7373 0.7914 0.7730 0.7828 - - - - -
0.2753 1200 7.4308 13.4980 0.7844 0.7373 0.7890 0.7709 0.7755 - - - - -
0.2982 1300 7.3625 12.8380 0.7984 0.7520 0.8032 0.7824 0.7915 - - - - -
0.3212 1400 6.9421 12.7016 0.7912 0.7358 0.7960 0.7749 0.7850 - - - - -
0.3441 1500 7.0635 13.2198 0.8018 0.7578 0.8070 0.7861 0.7961 - - - - -
0.3671 1600 6.6682 13.3225 0.7906 0.7522 0.7944 0.7763 0.7849 - - - - -
0.3900 1700 6.42 12.7381 0.7984 0.7449 0.8021 0.7806 0.7911 - - - - -
0.4129 1800 6.659 13.0247 0.7947 0.7461 0.8002 0.7808 0.7876 - - - - -
0.4359 1900 6.1664 12.6814 0.7893 0.7312 0.7959 0.7700 0.7807 - - - - -
0.4588 2000 6.392 13.0238 0.7935 0.7354 0.7987 0.7758 0.7860 - - - - -
0.4818 2100 6.177 12.8833 0.7891 0.7428 0.7924 0.7723 0.7801 - - - - -
0.5047 2200 6.0411 12.5269 0.7836 0.7400 0.7875 0.7664 0.7765 - - - - -
0.5276 2300 6.1506 13.4349 0.7741 0.7350 0.7803 0.7556 0.7634 - - - - -
0.5506 2400 6.109 12.6996 0.7808 0.7326 0.7860 0.7663 0.7735 - - - - -
0.5735 2500 6.2849 13.2831 0.7874 0.7365 0.7932 0.7727 0.7794 - - - - -
0.5965 2600 6.0658 12.9425 0.7988 0.7481 0.8042 0.7818 0.7889 - - - - -
0.6194 2700 6.0646 13.0144 0.7965 0.7509 0.8010 0.7800 0.7875 - - - - -
0.6423 2800 6.0795 12.7602 0.7912 0.7472 0.7937 0.7778 0.7818 - - - - -
0.6653 2900 6.2407 13.2381 0.7829 0.7381 0.7873 0.7664 0.7765 - - - - -
0.6882 3000 6.1872 12.9064 0.7942 0.7516 0.7965 0.7793 0.7857 - - - - -
0.7112 3100 5.8987 12.9323 0.8065 0.7585 0.8087 0.7909 0.7989 - - - - -
0.7341 3200 5.996 13.1017 0.7971 0.7566 0.8005 0.7811 0.7889 - - - - -
0.7571 3300 5.3748 12.7601 0.8398 0.7881 0.8441 0.8232 0.8337 - - - - -
0.7800 3400 4.0798 12.7221 0.8400 0.7908 0.8440 0.8255 0.8342 - - - - -
0.8029 3500 3.6024 12.5445 0.8408 0.7892 0.8447 0.8247 0.8347 - - - - -
0.8259 3600 3.4619 12.6025 0.8405 0.7883 0.8442 0.8255 0.8347 - - - - -
0.8488 3700 3.2288 12.6636 0.8388 0.7872 0.8433 0.8226 0.8330 - - - - -
0.8718 3800 3.0543 12.6475 0.8386 0.7834 0.8427 0.8229 0.8330 - - - - -
0.8947 3900 3.0368 12.5390 0.8407 0.7845 0.8444 0.8227 0.8346 - - - - -
0.9176 4000 2.9591 12.5709 0.8419 0.7864 0.8456 0.8245 0.8359 - - - - -
0.9406 4100 2.944 12.6029 0.8415 0.7868 0.8452 0.8245 0.8359 - - - - -
0.9635 4200 2.9032 12.5514 0.8423 0.7888 0.8455 0.8254 0.8363 - - - - -
0.9865 4300 2.838 12.6054 0.8416 0.7872 0.8450 0.8244 0.8356 - - - - -
1.0 4359 - - - - - - - 0.8190 0.7486 0.8211 0.7936 0.8129

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 0.244 kWh
  • Carbon Emitted: 0.095 kg of CO2
  • Hours Used: 0.923 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.0.0.dev0
  • Transformers: 4.41.0.dev0
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.26.1
  • Datasets: 2.18.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}