tomaarsen's picture
tomaarsen HF staff
Add new SentenceTransformer model.
db1198e verified
metadata
language:
  - en
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - loss:SoftmaxLoss
base_model: google-bert/bert-base-uncased
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
widget:
  - source_sentence: the guy is dead
    sentences:
      - The dog is dead.
      - The man is training the dog.
      - People gather for an event.
  - source_sentence: the boy is five
    sentences:
      - The girl is five years old.
      - A man sits in a hotel lobby.
      - The man is laying on the couch.
  - source_sentence: a guy is waxing
    sentences:
      - A woman is making music.
      - A girl is laying in the pool
      - She is the boy's aunt.
  - source_sentence: Dog herding cows
    sentences:
      - A woman is walking her dog.
      - Both people are standing up.
      - The women are friends.
  - source_sentence: There is a party
    sentences:
      - people take pictures
      - A man is repainting a garage
      - the crew all ate lunch alone
pipeline_tag: sentence-similarity
co2_eq_emissions:
  emissions: 3.4540412355858656
  energy_consumed: 0.008886090721390334
  source: codecarbon
  training_type: fine-tuning
  on_cloud: false
  cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
  ram_total_size: 31.777088165283203
  hours_used: 0.049
  hardware_used: 1 x NVIDIA GeForce RTX 3090
model-index:
  - name: SentenceTransformer based on google-bert/bert-base-uncased
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev
          type: sts-dev
        metrics:
          - type: pearson_cosine
            value: 0.5998264726332272
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.6439392261876368
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.6232915971361167
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.6407370027700541
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.6204725584722414
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.6394239914170929
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.4799617911944018
            name: Pearson Dot
          - type: spearman_dot
            value: 0.4939854901099171
            name: Spearman Dot
          - type: pearson_max
            value: 0.6232915971361167
            name: Pearson Max
          - type: spearman_max
            value: 0.6439392261876368
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.5516604742812986
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.5840596347673308
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.5842488902993314
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.5886614741524346
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.582443715857982
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.5869827075201962
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.4054565422297012
            name: Pearson Dot
          - type: spearman_dot
            value: 0.40476618101346834
            name: Spearman Dot
          - type: pearson_max
            value: 0.5842488902993314
            name: Pearson Max
          - type: spearman_max
            value: 0.5886614741524346
            name: Spearman Max

SentenceTransformer based on google-bert/bert-base-uncased

This is a sentence-transformers model finetuned from google-bert/bert-base-uncased on the sentence-transformers/all-nli dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/bert-base-uncased-nli-v1")
# Run inference
sentences = [
    'There is a party',
    'people take pictures',
    'A man is repainting a garage',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.5998
spearman_cosine 0.6439
pearson_manhattan 0.6233
spearman_manhattan 0.6407
pearson_euclidean 0.6205
spearman_euclidean 0.6394
pearson_dot 0.48
spearman_dot 0.494
pearson_max 0.6233
spearman_max 0.6439

Semantic Similarity

Metric Value
pearson_cosine 0.5517
spearman_cosine 0.5841
pearson_manhattan 0.5842
spearman_manhattan 0.5887
pearson_euclidean 0.5824
spearman_euclidean 0.587
pearson_dot 0.4055
spearman_dot 0.4048
pearson_max 0.5842
spearman_max 0.5887

Training Details

Training Dataset

sentence-transformers/all-nli

  • Dataset: sentence-transformers/all-nli at cc6c526
  • Size: 10,000 training samples
  • Columns: premise, hypothesis, and label
  • Approximate statistics based on the first 1000 samples:
    premise hypothesis label
    type string string int
    details
    • min: 6 tokens
    • mean: 17.38 tokens
    • max: 52 tokens
    • min: 4 tokens
    • mean: 10.7 tokens
    • max: 31 tokens
    • 0: ~33.40%
    • 1: ~33.30%
    • 2: ~33.30%
  • Samples:
    premise hypothesis label
    A person on a horse jumps over a broken down airplane. A person is training his horse for a competition. 1
    A person on a horse jumps over a broken down airplane. A person is at a diner, ordering an omelette. 2
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse. 0
  • Loss: SoftmaxLoss

Evaluation Dataset

sentence-transformers/all-nli

  • Dataset: sentence-transformers/all-nli at cc6c526
  • Size: 1,000 evaluation samples
  • Columns: premise, hypothesis, and label
  • Approximate statistics based on the first 1000 samples:
    premise hypothesis label
    type string string int
    details
    • min: 6 tokens
    • mean: 18.44 tokens
    • max: 57 tokens
    • min: 5 tokens
    • mean: 10.57 tokens
    • max: 25 tokens
    • 0: ~33.10%
    • 1: ~33.30%
    • 2: ~33.60%
  • Samples:
    premise hypothesis label
    Two women are embracing while holding to go packages. The sisters are hugging goodbye while holding to go packages after just eating lunch. 1
    Two women are embracing while holding to go packages. Two woman are holding packages. 0
    Two women are embracing while holding to go packages. The men are fighting outside a deli. 2
  • Loss: SoftmaxLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • bf16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: False
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: None
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss sts-dev_spearman_cosine sts-test_spearman_cosine
0 0 - - 0.5931 -
0.16 100 1.056 0.9278 0.6555 -
0.32 200 0.8966 0.8751 0.6381 -
0.48 300 0.8646 0.8393 0.6170 -
0.64 400 0.8328 0.8100 0.5804 -
0.8 500 0.8307 0.7940 0.6413 -
0.96 600 0.8373 0.7602 0.6439 -
1.0 625 - - - 0.5841

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 0.009 kWh
  • Carbon Emitted: 0.003 kg of CO2
  • Hours Used: 0.049 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.0.0.dev0
  • Transformers: 4.41.0.dev0
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.26.1
  • Datasets: 2.18.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers and SoftmaxLoss

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}