tomaarsen's picture
tomaarsen HF staff
Add new SentenceTransformer model.
efa1ad0 verified
metadata
language:
  - en
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - loss:MultipleNegativesRankingLoss
base_model: distilbert/distilroberta-base
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
widget:
  - source_sentence: There's a dock
    sentences:
      - A boat docked on a river.
      - The girl is standing.
      - The boy is sleeping.
  - source_sentence: The boy scowls
    sentences:
      - The boy is smiling
      - A story book is open.
      - Two women are sleeping.
  - source_sentence: A bird flying.
    sentences:
      - an eagle flies
      - The person is amused.
      - Two men are sleeping.
  - source_sentence: an eagle flies
    sentences:
      - A butterfly flys freely.
      - Two men are sleeping.
      - Some men sleep.
  - source_sentence: A woman sings.
    sentences:
      - The woman is singing.
      - a man is wearing blue
      - The boy is sleeping.
pipeline_tag: sentence-similarity
co2_eq_emissions:
  emissions: 1.414068558007261
  energy_consumed: 0.003637924574628535
  source: codecarbon
  training_type: fine-tuning
  on_cloud: false
  cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
  ram_total_size: 31.777088165283203
  hours_used: 0.02
  hardware_used: 1 x NVIDIA GeForce RTX 3090
model-index:
  - name: SentenceTransformer based on distilbert/distilroberta-base
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev
          type: sts-dev
        metrics:
          - type: pearson_cosine
            value: 0.7472500570689873
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7815286852337371
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7466164303556344
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7564406124153681
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7470476982963574
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7553538112024218
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.46791742113291
            name: Pearson Dot
          - type: spearman_dot
            value: 0.48306144010812363
            name: Spearman Dot
          - type: pearson_max
            value: 0.7472500570689873
            name: Pearson Max
          - type: spearman_max
            value: 0.7815286852337371
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.7145936155377322
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7188509446042572
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7144637059488601
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7051742909657058
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7150126984629757
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7054604043597239
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.4317482386066799
            name: Pearson Dot
          - type: spearman_dot
            value: 0.4292906929274994
            name: Spearman Dot
          - type: pearson_max
            value: 0.7150126984629757
            name: Pearson Max
          - type: spearman_max
            value: 0.7188509446042572
            name: Spearman Max

SentenceTransformer based on distilbert/distilroberta-base

This is a sentence-transformers model finetuned from distilbert/distilroberta-base on the sentence-transformers/all-nli dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/distilroberta-base-nli-v2")
# Run inference
sentences = [
    'A woman sings.',
    'The woman is singing.',
    'a man is wearing blue',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.7473
spearman_cosine 0.7815
pearson_manhattan 0.7466
spearman_manhattan 0.7564
pearson_euclidean 0.747
spearman_euclidean 0.7554
pearson_dot 0.4679
spearman_dot 0.4831
pearson_max 0.7473
spearman_max 0.7815

Semantic Similarity

Metric Value
pearson_cosine 0.7146
spearman_cosine 0.7189
pearson_manhattan 0.7145
spearman_manhattan 0.7052
pearson_euclidean 0.715
spearman_euclidean 0.7055
pearson_dot 0.4317
spearman_dot 0.4293
pearson_max 0.715
spearman_max 0.7189

Training Details

Training Dataset

sentence-transformers/all-nli

  • Dataset: sentence-transformers/all-nli at cc6c526
  • Size: 10,000 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 7 tokens
    • mean: 10.38 tokens
    • max: 45 tokens
    • min: 6 tokens
    • mean: 12.8 tokens
    • max: 39 tokens
    • min: 6 tokens
    • mean: 13.4 tokens
    • max: 50 tokens
  • Samples:
    anchor positive negative
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse. A person is at a diner, ordering an omelette.
    Children smiling and waving at camera There are children present The kids are frowning
    A boy is jumping on skateboard in the middle of a red bridge. The boy does a skateboarding trick. The boy skates down the sidewalk.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

sentence-transformers/all-nli

  • Dataset: sentence-transformers/all-nli at cc6c526
  • Size: 1,000 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 6 tokens
    • mean: 18.02 tokens
    • max: 66 tokens
    • min: 5 tokens
    • mean: 9.81 tokens
    • max: 29 tokens
    • min: 5 tokens
    • mean: 10.37 tokens
    • max: 29 tokens
  • Samples:
    anchor positive negative
    Two women are embracing while holding to go packages. Two woman are holding packages. The men are fighting outside a deli.
    Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink. Two kids in numbered jerseys wash their hands. Two kids in jackets walk to school.
    A man selling donuts to a customer during a world exhibition event held in the city of Angeles A man selling donuts to a customer. A woman drinks her coffee in a small cafe.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: False
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: None
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step loss sts-dev_spearman_cosine sts-test_spearman_cosine
0 0 - 0.6375 -
0.1266 10 2.9835 0.7807 -
0.2532 20 1.7048 0.7782 -
0.3797 30 1.6657 0.7847 -
0.5063 40 1.7352 0.7900 -
0.6329 50 1.6400 0.7863 -
0.7595 60 1.7281 0.7820 -
0.8861 70 1.7066 0.7815 -
1.0 79 - - 0.7189

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 0.004 kWh
  • Carbon Emitted: 0.001 kg of CO2
  • Hours Used: 0.02 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.0.0.dev0
  • Transformers: 4.41.0.dev0
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.26.1
  • Datasets: 2.18.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}