SentenceTransformer based on sentence-transformers/stsb-distilbert-base

This is a sentence-transformers model finetuned from sentence-transformers/stsb-distilbert-base on the quora-duplicates dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("yahyaabd/stsb-distilbert-base-ocl")
# Run inference
sentences = [
    'What is the best fact checking sources that all Quorans will most trust?',
    'What is the most memorable book that Quorans have read?',
    'Is working in McKinsey one of the best and surest ways to get into Harvard Business School?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 0.869
cosine_accuracy_threshold 0.8137
cosine_f1 0.839
cosine_f1_threshold 0.7617
cosine_precision 0.7818
cosine_recall 0.9053
cosine_ap 0.8853
cosine_mcc 0.7338

Paraphrase Mining

Metric Value
average_precision 0.5427
f1 0.5533
precision 0.5508
recall 0.5557
threshold 0.8659

Information Retrieval

Metric Value
cosine_accuracy@1 0.9298
cosine_accuracy@3 0.9732
cosine_accuracy@5 0.982
cosine_accuracy@10 0.9868
cosine_precision@1 0.9298
cosine_precision@3 0.4154
cosine_precision@5 0.2679
cosine_precision@10 0.1417
cosine_recall@1 0.8009
cosine_recall@3 0.9349
cosine_recall@5 0.9611
cosine_recall@10 0.9765
cosine_ndcg@10 0.9526
cosine_mrr@10 0.9522
cosine_map@100 0.94

Training Details

Training Dataset

quora-duplicates

  • Dataset: quora-duplicates at 451a485
  • Size: 404,290 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 6 tokens
    • mean: 16.01 tokens
    • max: 67 tokens
    • min: 6 tokens
    • mean: 15.9 tokens
    • max: 72 tokens
    • 0: ~64.40%
    • 1: ~35.60%
  • Samples:
    sentence1 sentence2 label
    How much worse do things need to get before the "blue" states cut off welfare to the "red" states? If the red states and the blue states were separated into two countries, which country would be more successful? 0
    Can you offer me any advice on how to lose weight? What are the best ways to lose weight? What is the best diet plan? 1
    How do I break my knee? How do I break my elbow? 0
  • Loss: OnlineContrastiveLoss

Evaluation Dataset

quora-duplicates

  • Dataset: quora-duplicates at 451a485
  • Size: 404,290 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 6 tokens
    • mean: 15.98 tokens
    • max: 53 tokens
    • min: 6 tokens
    • mean: 15.9 tokens
    • max: 77 tokens
    • 0: ~62.00%
    • 1: ~38.00%
  • Samples:
    sentence1 sentence2 label
    Which is the best SAP online training centre at Hyderabad? Which is the best sap workflow online training institute in Hyderabad? 1
    How did World War Two start? What will most likely cause World War III? 0
    How do I find a unique string from a given string in Java without methods such as split, contain, and divide? How can I split the string "[] {() <>} []" into " [,], {, (, ..." in Java? 0
  • Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss quora-duplicates_cosine_ap quora-duplicates-dev_average_precision cosine_ndcg@10
0 0 - - 0.7402 0.4200 0.9413
0.0640 100 2.481 - - - -
0.1280 200 2.1466 - - - -
0.1599 250 - 1.7997 0.8327 0.4596 0.9355
0.1919 300 2.0354 - - - -
0.2559 400 1.9342 - - - -
0.3199 500 1.9132 1.6231 0.8617 0.4896 0.9425
0.3839 600 1.8015 - - - -
0.4479 700 1.7407 - - - -
0.4798 750 - 1.4953 0.8737 0.5112 0.9468
0.5118 800 1.6454 - - - -
0.5758 900 1.6568 - - - -
0.6398 1000 1.6811 1.4678 0.8751 0.5290 0.9457
0.7038 1100 1.711 - - - -
0.7678 1200 1.6449 - - - -
0.7997 1250 - 1.4363 0.8811 0.5327 0.9507
0.8317 1300 1.5921 - - - -
0.8957 1400 1.5062 - - - -
0.9597 1500 1.5728 1.4029 0.8853 0.5427 0.9526

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.4.0
  • Transformers: 4.48.1
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
6
Safetensors
Model size
66.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yahyaabd/stsb-distilbert-base-ocl

Finetuned
(8)
this model

Dataset used to train yahyaabd/stsb-distilbert-base-ocl

Evaluation results