tomaarsen's picture
tomaarsen HF staff
Fix issue in dataset name
653bf76 verified
metadata
language:
  - en
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - loss:CosineSimilarityLoss
base_model: google-bert/bert-base-uncased
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
widget:
  - source_sentence: A woman is dancing.
    sentences:
      - An audience watches a girl dance.
      - A man is outside on a July day.
      - A man is cutting up carrots.
  - source_sentence: A man shoots a man.
    sentences:
      - The man is aiming a gun.
      - A helicopter flies over water.
      - a dog trots through the grass.
  - source_sentence: A man is spitting.
    sentences:
      - A man is crying.
      - A helicopter flies over water.
      - A slow loris hanging on a cord.
  - source_sentence: A boy is vacuuming.
    sentences:
      - A little boy is vacuuming the floor.
      - A guy is playing an instrument.
      - A woman equestrian riding a horse.
  - source_sentence: A woman is reading.
    sentences:
      - A woman is writing something.
      - A man is standing in the rain.
      - A man slices an onion.
pipeline_tag: sentence-similarity
co2_eq_emissions:
  emissions: 4.738044659547021
  energy_consumed: 0.012189401288254294
  source: codecarbon
  training_type: fine-tuning
  on_cloud: false
  cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
  ram_total_size: 31.777088165283203
  hours_used: 0.058
  hardware_used: 1 x NVIDIA GeForce RTX 3090
model-index:
  - name: SentenceTransformer based on google-bert/bert-base-uncased
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.8682431647858876
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8703313606188837
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8385159885167599
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8435007318066774
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8391102057706885
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8441165556372876
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.8140605796498762
            name: Pearson Dot
          - type: spearman_dot
            value: 0.8174591525223206
            name: Spearman Dot
          - type: pearson_max
            value: 0.8682431647858876
            name: Pearson Max
          - type: spearman_max
            value: 0.8703313606188837
            name: Spearman Max
          - type: pearson_cosine
            value: 0.8418519780467144
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8363102079867478
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8282641539296681
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8261442750405601
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8279900369159026
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8258841934048688
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.7681509901549408
            name: Pearson Dot
          - type: spearman_dot
            value: 0.757455580460212
            name: Spearman Dot
          - type: pearson_max
            value: 0.8418519780467144
            name: Pearson Max
          - type: spearman_max
            value: 0.8363102079867478
            name: Spearman Max

SentenceTransformer based on google-bert/bert-base-uncased

This is a sentence-transformers model finetuned from google-bert/bert-base-uncased on the sentence-transformers/stsb dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/bert-base-uncased-augmentation-indomain-nlpaug-sts")
# Run inference
sentences = [
    'A woman is reading.',
    'A woman is writing something.',
    'A man is standing in the rain.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.8682
spearman_cosine 0.8703
pearson_manhattan 0.8385
spearman_manhattan 0.8435
pearson_euclidean 0.8391
spearman_euclidean 0.8441
pearson_dot 0.8141
spearman_dot 0.8175
pearson_max 0.8682
spearman_max 0.8703

Semantic Similarity

Metric Value
pearson_cosine 0.8419
spearman_cosine 0.8363
pearson_manhattan 0.8283
spearman_manhattan 0.8261
pearson_euclidean 0.828
spearman_euclidean 0.8259
pearson_dot 0.7682
spearman_dot 0.7575
pearson_max 0.8419
spearman_max 0.8363

Training Details

Training Dataset

sentence-transformers/stsb

  • Dataset: sentence-transformers/stsb at d999f12
  • Size: 11,498 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 10.0 tokens
    • max: 28 tokens
    • min: 5 tokens
    • mean: 9.95 tokens
    • max: 25 tokens
    • min: 0.0
    • mean: 0.54
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A plane is taking off. An air plane is taking off. 1.0
    A man is playing a large flute. A man is playing a flute. 0.76
    A man is spreading shreded cheese on a pizza. A man is spreading shredded cheese on an uncooked pizza. 0.76
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

sentence-transformers/stsb

  • Dataset: sentence-transformers/stsb at d999f12
  • Size: 1,500 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 5 tokens
    • mean: 15.1 tokens
    • max: 45 tokens
    • min: 6 tokens
    • mean: 15.11 tokens
    • max: 53 tokens
    • min: 0.0
    • mean: 0.47
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A man with a hard hat is dancing. A man wearing a hard hat is dancing. 1.0
    A young child is riding a horse. A child is riding a horse. 0.95
    A man is feeding a mouse to a snake. The man is feeding a mouse to the snake. 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: False
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: None
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

| Epoch | Step | Training Loss | loss | sts-dev_spearman_cosine | sts-test_spearman_cosine | |:------:|:----:|:-------------:|:------:|:-----------------------:| | | 0.1391 | 100 | 0.0572 | 0.0427 | 0.8222 | | | 0.2782 | 200 | 0.0316 | 0.0342 | 0.8450 | | | 0.4172 | 300 | 0.0276 | 0.0324 | 0.8621 | | | 0.5563 | 400 | 0.0246 | 0.0300 | 0.8661 | | | 0.6954 | 500 | 0.0206 | 0.0288 | 0.8650 | | | 0.8345 | 600 | 0.0186 | 0.0301 | 0.8696 | | | 0.9736 | 700 | 0.0185 | 0.0286 | 0.8703 | | | 1.0 | 719 | - | - | 0.8363 | 0.8363 |

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 0.012 kWh
  • Carbon Emitted: 0.005 kg of CO2
  • Hours Used: 0.058 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.0.0.dev0
  • Transformers: 4.41.0.dev0
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.26.1
  • Datasets: 2.18.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}