SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("aleynahukmet/all-MiniLM-L6-v2-8-layers")
# Run inference
sentences = [
    'A black dog is drinking next to a brown and white dog that is looking at an orange ball in the lake, whilst a horse and rider passes behind.',
    'There are two people running around a track in lane three and the one wearing a blue shirt with a green thing over the eyes is just barely ahead of the guy wearing an orange shirt and sunglasses.',
    'the guy is dead',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric sts-dev sts-test
pearson_cosine 0.8649 0.8203
spearman_cosine 0.8649 0.819

Knowledge Distillation

Metric Value
negative_mse -0.0245

Training Details

Training Dataset

Unnamed Dataset

  • Size: 9,014,210 training samples
  • Columns: sentence and label
  • Approximate statistics based on the first 1000 samples:
    sentence label
    type string list
    details
    • min: 4 tokens
    • mean: 12.24 tokens
    • max: 52 tokens
    • size: 384 elements
  • Samples:
    sentence label
    A person on a horse jumps over a broken down airplane. [-0.009216307662427425, 0.003964003175497055, 0.04029734805226326, 0.0030935262329876423, -0.03516044840216637, ...]
    Children smiling and waving at camera [-0.03215238079428673, 0.06086821109056473, 0.013251038268208504, -0.017755677923560143, 0.07927625626325607, ...]
    A boy is jumping on skateboard in the middle of a red bridge. [-0.020561737939715385, -0.03641558438539505, -0.039370208978652954, -0.0975518748164177, 0.005307587794959545, ...]
  • Loss: MSELoss

Evaluation Dataset

Unnamed Dataset

  • Size: 10,000 evaluation samples
  • Columns: sentence and label
  • Approximate statistics based on the first 1000 samples:
    sentence label
    type string list
    details
    • min: 5 tokens
    • mean: 13.23 tokens
    • max: 57 tokens
    • size: 384 elements
  • Samples:
    sentence label
    Two women are embracing while holding to go packages. [-0.007923883385956287, -0.024198176339268684, 0.034445445984601974, 0.036053989082574844, -0.06740871071815491, ...]
    Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink. [-0.08869566023349762, 0.02789478376507759, 0.060685668140649796, -0.02580258436501026, 0.008359752595424652, ...]
    A man selling donuts to a customer during a world exhibition event held in the city of Angeles [0.027255145832896233, 0.07622072845697403, 0.025504805147647858, -0.0542026124894619, -0.052822694182395935, ...]
  • Loss: MSELoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • learning_rate: 0.0001
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 0.0001
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss sts-dev_spearman_cosine negative_mse sts-test_spearman_cosine
0 0 - - 0.7048 -0.3846 -
0.0071 1000 0.0032 - - - -
0.0142 2000 0.0023 - - - -
0.0213 3000 0.0019 - - - -
0.0284 4000 0.0017 - - - -
0.0355 5000 0.0015 0.0013 0.8149 -0.1309 -
0.0426 6000 0.0014 - - - -
0.0497 7000 0.0012 - - - -
0.0568 8000 0.0011 - - - -
0.0639 9000 0.001 - - - -
0.0710 10000 0.001 0.0008 0.8495 -0.0754 -
0.0781 11000 0.0009 - - - -
0.0852 12000 0.0008 - - - -
0.0923 13000 0.0008 - - - -
0.0994 14000 0.0007 - - - -
0.1065 15000 0.0007 0.0005 0.8569 -0.0528 -
0.1136 16000 0.0007 - - - -
0.1207 17000 0.0007 - - - -
0.1278 18000 0.0006 - - - -
0.1349 19000 0.0006 - - - -
0.1420 20000 0.0006 0.0004 0.8589 -0.0438 -
0.1491 21000 0.0006 - - - -
0.1562 22000 0.0006 - - - -
0.1633 23000 0.0006 - - - -
0.1704 24000 0.0006 - - - -
0.1775 25000 0.0005 0.0004 0.8608 -0.0392 -
0.1846 26000 0.0005 - - - -
0.1917 27000 0.0005 - - - -
0.1988 28000 0.0005 - - - -
0.2059 29000 0.0005 - - - -
0.2130 30000 0.0005 0.0004 0.8619 -0.0363 -
0.2201 31000 0.0005 - - - -
0.2272 32000 0.0005 - - - -
0.2343 33000 0.0005 - - - -
0.2414 34000 0.0005 - - - -
0.2485 35000 0.0005 0.0003 0.8619 -0.0343 -
0.2556 36000 0.0005 - - - -
0.2627 37000 0.0005 - - - -
0.2698 38000 0.0005 - - - -
0.2769 39000 0.0005 - - - -
0.2840 40000 0.0005 0.0003 0.8613 -0.0329 -
0.2911 41000 0.0005 - - - -
0.2982 42000 0.0005 - - - -
0.3053 43000 0.0005 - - - -
0.3124 44000 0.0005 - - - -
0.3195 45000 0.0005 0.0003 0.8633 -0.0316 -
0.3266 46000 0.0005 - - - -
0.3337 47000 0.0005 - - - -
0.3408 48000 0.0005 - - - -
0.3479 49000 0.0004 - - - -
0.3550 50000 0.0004 0.0003 0.8631 -0.0306 -
0.3621 51000 0.0004 - - - -
0.3692 52000 0.0004 - - - -
0.3763 53000 0.0004 - - - -
0.3834 54000 0.0004 - - - -
0.3905 55000 0.0004 0.0003 0.8635 -0.0297 -
0.3976 56000 0.0004 - - - -
0.4047 57000 0.0004 - - - -
0.4118 58000 0.0004 - - - -
0.4189 59000 0.0004 - - - -
0.4260 60000 0.0004 0.0003 0.8640 -0.0290 -
0.4331 61000 0.0004 - - - -
0.4402 62000 0.0004 - - - -
0.4473 63000 0.0004 - - - -
0.4544 64000 0.0004 - - - -
0.4615 65000 0.0004 0.0003 0.8644 -0.0285 -
0.4686 66000 0.0004 - - - -
0.4757 67000 0.0004 - - - -
0.4828 68000 0.0004 - - - -
0.4899 69000 0.0004 - - - -
0.4970 70000 0.0004 0.0003 0.8641 -0.0280 -
0.5041 71000 0.0004 - - - -
0.5112 72000 0.0004 - - - -
0.5183 73000 0.0004 - - - -
0.5254 74000 0.0004 - - - -
0.5325 75000 0.0004 0.0003 0.8648 -0.0276 -
0.5396 76000 0.0004 - - - -
0.5467 77000 0.0004 - - - -
0.5538 78000 0.0004 - - - -
0.5609 79000 0.0004 - - - -
0.5680 80000 0.0004 0.0003 0.8644 -0.0271 -
0.5751 81000 0.0004 - - - -
0.5822 82000 0.0004 - - - -
0.5893 83000 0.0004 - - - -
0.5964 84000 0.0004 - - - -
0.6035 85000 0.0004 0.0003 0.8648 -0.0267 -
0.6106 86000 0.0004 - - - -
0.6177 87000 0.0004 - - - -
0.6248 88000 0.0004 - - - -
0.6319 89000 0.0004 - - - -
0.6390 90000 0.0004 0.0003 0.8645 -0.0264 -
0.6461 91000 0.0004 - - - -
0.6532 92000 0.0004 - - - -
0.6603 93000 0.0004 - - - -
0.6674 94000 0.0004 - - - -
0.6745 95000 0.0004 0.0003 0.8643 -0.0261 -
0.6816 96000 0.0004 - - - -
0.6887 97000 0.0004 - - - -
0.6958 98000 0.0004 - - - -
0.7029 99000 0.0004 - - - -
0.7100 100000 0.0004 0.0003 0.8643 -0.0259 -
0.7171 101000 0.0004 - - - -
0.7242 102000 0.0004 - - - -
0.7313 103000 0.0004 - - - -
0.7384 104000 0.0004 - - - -
0.7455 105000 0.0004 0.0003 0.8646 -0.0257 -
0.7526 106000 0.0004 - - - -
0.7597 107000 0.0004 - - - -
0.7668 108000 0.0004 - - - -
0.7739 109000 0.0004 - - - -
0.7810 110000 0.0004 0.0003 0.8637 -0.0254 -
0.7881 111000 0.0004 - - - -
0.7952 112000 0.0004 - - - -
0.8023 113000 0.0004 - - - -
0.8094 114000 0.0004 - - - -
0.8165 115000 0.0004 0.0003 0.8643 -0.0252 -
0.8236 116000 0.0004 - - - -
0.8307 117000 0.0004 - - - -
0.8378 118000 0.0004 - - - -
0.8449 119000 0.0004 - - - -
0.8520 120000 0.0004 0.0003 0.8645 -0.0250 -
0.8591 121000 0.0004 - - - -
0.8662 122000 0.0004 - - - -
0.8733 123000 0.0004 - - - -
0.8804 124000 0.0004 - - - -
0.8875 125000 0.0004 0.0002 0.8646 -0.0248 -
0.8946 126000 0.0004 - - - -
0.9017 127000 0.0004 - - - -
0.9088 128000 0.0004 - - - -
0.9159 129000 0.0004 - - - -
0.9230 130000 0.0004 0.0002 0.8647 -0.0247 -
0.9301 131000 0.0004 - - - -
0.9372 132000 0.0004 - - - -
0.9443 133000 0.0004 - - - -
0.9514 134000 0.0004 - - - -
0.9585 135000 0.0004 0.0002 0.8646 -0.0246 -
0.9656 136000 0.0004 - - - -
0.9727 137000 0.0004 - - - -
0.9798 138000 0.0004 - - - -
0.9869 139000 0.0004 - - - -
0.994 140000 0.0004 0.0002 0.8649 -0.0245 -
1.0 140848 - - - - 0.8190
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.4
  • Sentence Transformers: 3.3.1
  • Transformers: 4.44.2
  • PyTorch: 2.4.1+cu121
  • Accelerate: 1.0.1
  • Datasets: 2.19.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MSELoss

@inproceedings{reimers-2020-multilingual-sentence-bert,
    title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2004.09813",
}
Downloads last month
8
Safetensors
Model size
15.6M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for aleynahukmet/all-MiniLM-L6-v2-8-layers

Finetuned
(198)
this model

Evaluation results