tomaarsen's picture
tomaarsen HF staff
Add new SentenceTransformer model.
93683ef verified
metadata
language:
  - en
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - loss:MSELoss
base_model: nreimers/TinyBERT_L-4_H-312_v2
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
  - negative_mse
widget:
  - source_sentence: A woman at home.
    sentences:
      - The woman is inside.
      - The woman is performing for an audience.
      - The two men are freinds
  - source_sentence: boys play football
    sentences:
      - Rival college football players are playing a football game.
      - A man looks at his watch at a bus stop.
      - A woman walking on an old bridge near a mountain.
  - source_sentence: Nobody has a pot
    sentences:
      - Nobody has a suit
      - A woman riding a bicycle on the street.
      - The front is decorated with Ethiopian themes and motifs.
  - source_sentence: A dog plays ball.
    sentences:
      - A dog with a ball.
      - A man looking into a microscope in a lab
      - Children go past their parents.
  - source_sentence: A person standing
    sentences:
      - There is a person standing outside
      - A young man plays a racing video game.
      - Two children playing on the floor with toy trains.
pipeline_tag: sentence-similarity
co2_eq_emissions:
  emissions: 3.457859864142588
  energy_consumed: 0.00889591477312334
  source: codecarbon
  training_type: fine-tuning
  on_cloud: false
  cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
  ram_total_size: 31.777088165283203
  hours_used: 0.054
  hardware_used: 1 x NVIDIA GeForce RTX 3090
model-index:
  - name: SentenceTransformer based on nreimers/TinyBERT_L-4_H-312_v2
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev
          type: sts-dev
        metrics:
          - type: pearson_cosine
            value: 0.8077673131159315
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8208863013753134
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8225516575982812
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8203236078973807
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8215663439432439
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8202318953605339
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.7901487535994149
            name: Pearson Dot
          - type: spearman_dot
            value: 0.7914362691291718
            name: Spearman Dot
          - type: pearson_max
            value: 0.8225516575982812
            name: Pearson Max
          - type: spearman_max
            value: 0.8208863013753134
            name: Spearman Max
      - task:
          type: knowledge-distillation
          name: Knowledge Distillation
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: negative_mse
            value: -50.125449895858765
            name: Negative Mse
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.7516961775809978
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7558402072520215
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7762734499549059
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.75965556867712
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7705568379382428
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7553604477247078
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.7306801501272192
            name: Pearson Dot
          - type: spearman_dot
            value: 0.7097993872384684
            name: Spearman Dot
          - type: pearson_max
            value: 0.7762734499549059
            name: Pearson Max
          - type: spearman_max
            value: 0.75965556867712
            name: Spearman Max

SentenceTransformer based on nreimers/TinyBERT_L-4_H-312_v2

This is a sentence-transformers model finetuned from nreimers/TinyBERT_L-4_H-312_v2 on the sentence-transformers/wikipedia-en-sentences dataset. It maps sentences & paragraphs to a 312-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 312, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/TinyBERT_L-4_H-312_v2-distilled-from-stsb-roberta-base-v2")
# Run inference
sentences = [
    'A person standing',
    'There is a person standing outside',
    'A young man plays a racing video game.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 312]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.8078
spearman_cosine 0.8209
pearson_manhattan 0.8226
spearman_manhattan 0.8203
pearson_euclidean 0.8216
spearman_euclidean 0.8202
pearson_dot 0.7901
spearman_dot 0.7914
pearson_max 0.8226
spearman_max 0.8209

Knowledge Distillation

Metric Value
negative_mse -50.1254

Semantic Similarity

Metric Value
pearson_cosine 0.7517
spearman_cosine 0.7558
pearson_manhattan 0.7763
spearman_manhattan 0.7597
pearson_euclidean 0.7706
spearman_euclidean 0.7554
pearson_dot 0.7307
spearman_dot 0.7098
pearson_max 0.7763
spearman_max 0.7597

Training Details

Training Dataset

sentence-transformers/wikipedia-en-sentences

  • Dataset: sentence-transformers/wikipedia-en-sentences at 4a0972d
  • Size: 200,000 training samples
  • Columns: sentence and label
  • Approximate statistics based on the first 1000 samples:
    sentence label
    type string list
    details
    • min: 4 tokens
    • mean: 12.24 tokens
    • max: 52 tokens
    • size: 312 elements
  • Samples:
    sentence label
    A person on a horse jumps over a broken down airplane. [-0.09614687412977219, 0.6815224885940552, 2.702199935913086, 1.8371250629425049, -1.2949433326721191, ...]
    Children smiling and waving at camera [2.769360303878784, 3.074428081512451, -7.291755676269531, 5.248741149902344, 2.85081148147583, ...]
    A boy is jumping on skateboard in the middle of a red bridge. [-3.0669667720794678, 2.9899890422821045, -1.253997802734375, 6.15218448638916, 0.5838223099708557, ...]
  • Loss: MSELoss

Evaluation Dataset

sentence-transformers/wikipedia-en-sentences

  • Dataset: sentence-transformers/wikipedia-en-sentences at 4a0972d
  • Size: 10,000 evaluation samples
  • Columns: sentence and label
  • Approximate statistics based on the first 1000 samples:
    sentence label
    type string list
    details
    • min: 5 tokens
    • mean: 13.23 tokens
    • max: 57 tokens
    • size: 312 elements
  • Samples:
    sentence label
    Two women are embracing while holding to go packages. [6.200135707855225, -2.0865142345428467, -2.1313390731811523, -1.9593913555145264, -1.081985592842102, ...]
    Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink. [1.7725015878677368, 0.6873414516448975, -2.5191268920898438, 3.866339683532715, 2.853647470474243, ...]
    A man selling donuts to a customer during a world exhibition event held in the city of Angeles [-3.317653179168701, 3.0908589363098145, 0.1683920919895172, -2.4405274391174316, -3.1366524696350098, ...]
  • Loss: MSELoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • learning_rate: 0.0001
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: False
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 0.0001
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: None
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss negative_mse sts-dev_spearman_cosine sts-test_spearman_cosine
0.032 100 0.8847 - - - -
0.064 200 0.8136 - - - -
0.096 300 0.697 - - - -
0.128 400 0.6128 - - - -
0.16 500 0.5634 0.6324 -63.2356 0.7564 -
0.192 600 0.5294 - - - -
0.224 700 0.5035 - - - -
0.256 800 0.4861 - - - -
0.288 900 0.4668 - - - -
0.32 1000 0.4515 0.5673 -56.7263 0.7965 -
0.352 1100 0.4376 - - - -
0.384 1200 0.4274 - - - -
0.416 1300 0.4178 - - - -
0.448 1400 0.4098 - - - -
0.48 1500 0.4053 0.5354 -53.5381 0.8091 -
0.512 1600 0.3934 - - - -
0.544 1700 0.391 - - - -
0.576 1800 0.3848 - - - -
0.608 1900 0.3785 - - - -
0.64 2000 0.3737 0.5168 -51.6829 0.8159 -
0.672 2100 0.3716 - - - -
0.704 2200 0.3695 - - - -
0.736 2300 0.3666 - - - -
0.768 2400 0.3616 - - - -
0.8 2500 0.358 0.5067 -50.6687 0.8189 -
0.832 2600 0.3551 - - - -
0.864 2700 0.3544 - - - -
0.896 2800 0.3524 - - - -
0.928 2900 0.3524 - - - -
0.96 3000 0.3529 0.5013 -50.1254 0.8209 -
0.992 3100 0.3496 - - - -
1.0 3125 - - - - 0.7558
  • The bold row denotes the saved checkpoint.

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 0.009 kWh
  • Carbon Emitted: 0.003 kg of CO2
  • Hours Used: 0.054 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.0.0.dev0
  • Transformers: 4.41.0.dev0
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.26.1
  • Datasets: 2.18.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MSELoss

@inproceedings{reimers-2020-multilingual-sentence-bert,
    title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2004.09813",
}