omega5505's picture
Add new SentenceTransformer model
ac5e2e9 verified
metadata
language:
  - en
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:404290
  - loss:OnlineContrastiveLoss
base_model: sentence-transformers/stsb-distilbert-base
widget:
  - source_sentence: Why Modi is putting a ban on 500 and 1000 notes?
    sentences:
      - Why making multiple fake accounts on Quora is illegal?
      - >-
        What are the advantages of the decision taken by the Government of India
        to scrap out 500 and 1000 rupees notes?
      - Why should I go for internships?
  - source_sentence: Where can I buy cheap t-shirts?
    sentences:
      - Where can I buy cheap wholesale t-shirts?
      - How can I make money from a blog?
      - What are the best places to shop in Charleston, SC?
  - source_sentence: What are the most important mobile applications?
    sentences:
      - How can I tell if my wife's vagina had a bigger penis inside?
      - What is the most important apps in your phone?
      - >-
        What do you think Ned Stark would have done or said to Jon Snow if he
        was able to join the Night’s Watch or escaped his beheading?
  - source_sentence: What is the whole process for making Android games with high graphics?
    sentences:
      - What lf I don't accept Jesus as God?
      - >-
        I have to masturbate3 times to feel an orgasm sometimes only2 times what
        is wrong with me I went to the doctor and they do not believe meWhat's
        wrong?
      - What does a healthy diet consist of?
  - source_sentence: Why do so many religious people believe in healing miracles?
    sentences:
      - Is Warframe better than Destiny?
      - What do you like about China?
      - Is believing in God a bad thing?
datasets:
  - sentence-transformers/quora-duplicates
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy
  - cosine_accuracy_threshold
  - cosine_f1
  - cosine_f1_threshold
  - cosine_precision
  - cosine_recall
  - cosine_ap
  - cosine_mcc
  - average_precision
  - f1
  - precision
  - recall
  - threshold
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
model-index:
  - name: SentenceTransformer based on sentence-transformers/stsb-distilbert-base
    results:
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: quora duplicates
          type: quora-duplicates
        metrics:
          - type: cosine_accuracy
            value: 0.877
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.7857047319412231
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.8516284680337757
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.774639368057251
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.8209302325581396
            name: Cosine Precision
          - type: cosine_recall
            value: 0.8847117794486216
            name: Cosine Recall
          - type: cosine_ap
            value: 0.8988328505183655
            name: Cosine Ap
          - type: cosine_mcc
            value: 0.7483655051498526
            name: Cosine Mcc
      - task:
          type: paraphrase-mining
          name: Paraphrase Mining
        dataset:
          name: quora duplicates dev
          type: quora-duplicates-dev
        metrics:
          - type: average_precision
            value: 0.5483042026376685
            name: Average Precision
          - type: f1
            value: 0.5606415792720543
            name: F1
          - type: precision
            value: 0.5539301735907939
            name: Precision
          - type: recall
            value: 0.5675176100314733
            name: Recall
          - type: threshold
            value: 0.8631762564182281
            name: Threshold
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: cosine_accuracy@1
            value: 0.9308
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.969
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.9778
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9854
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.9308
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.4145333333333333
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.26696000000000003
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.14144
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.8008592901379665
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.9314231047351341
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.9558165998609235
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9743579383296442
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.9511384841680516
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.9511976190476192
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.939071878001028
            name: Cosine Map@100

SentenceTransformer based on sentence-transformers/stsb-distilbert-base

This is a sentence-transformers model finetuned from sentence-transformers/stsb-distilbert-base on the quora-duplicates dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("omega5505/stsb-distilbert-base-ocl")
# Run inference
sentences = [
    'Why do so many religious people believe in healing miracles?',
    'Is believing in God a bad thing?',
    'What do you like about China?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 0.877
cosine_accuracy_threshold 0.7857
cosine_f1 0.8516
cosine_f1_threshold 0.7746
cosine_precision 0.8209
cosine_recall 0.8847
cosine_ap 0.8988
cosine_mcc 0.7484

Paraphrase Mining

Metric Value
average_precision 0.5483
f1 0.5606
precision 0.5539
recall 0.5675
threshold 0.8632

Information Retrieval

Metric Value
cosine_accuracy@1 0.9308
cosine_accuracy@3 0.969
cosine_accuracy@5 0.9778
cosine_accuracy@10 0.9854
cosine_precision@1 0.9308
cosine_precision@3 0.4145
cosine_precision@5 0.267
cosine_precision@10 0.1414
cosine_recall@1 0.8009
cosine_recall@3 0.9314
cosine_recall@5 0.9558
cosine_recall@10 0.9744
cosine_ndcg@10 0.9511
cosine_mrr@10 0.9512
cosine_map@100 0.9391

Training Details

Training Dataset

quora-duplicates

  • Dataset: quora-duplicates at 451a485
  • Size: 404,290 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 6 tokens
    • mean: 15.73 tokens
    • max: 65 tokens
    • min: 6 tokens
    • mean: 15.93 tokens
    • max: 85 tokens
    • 0: ~61.60%
    • 1: ~38.40%
  • Samples:
    sentence1 sentence2 label
    How can Trump supporters claim he didn't mock a disabled reporter when there is live footage of him mocking a disabled reporter? Why don't people actually watch the Trump video of him allegedly mocking a disabled reporter? 0
    Where can I get the best digital marketing course (online & offline) in India? Which is the best digital marketing institute for professionals in India? 1
    What best two liner shayri? What does "senile dementia, uncomplicated" mean in medical terms? 0
  • Loss: OnlineContrastiveLoss

Evaluation Dataset

quora-duplicates

  • Dataset: quora-duplicates at 451a485
  • Size: 404,290 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 6 tokens
    • mean: 16.14 tokens
    • max: 70 tokens
    • min: 6 tokens
    • mean: 15.92 tokens
    • max: 74 tokens
    • 0: ~60.10%
    • 1: ~39.90%
  • Samples:
    sentence1 sentence2 label
    What are some must subscribe RSS feeds? What are RSS feeds? 0
    How close are Madonna and Hillary Clinton? Why do people say Hillary Clinton is a crook? 0
    Can you share best day of your life? What is the Best Day of your life till date? 1
  • Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss quora-duplicates_cosine_ap quora-duplicates-dev_average_precision cosine_ndcg@10
0 0 - - 0.7458 0.4200 0.9390
0.0640 100 2.5263 - - - -
0.1280 200 2.1489 - - - -
0.1599 250 - 1.8621 0.8433 0.3907 0.9329
0.1919 300 2.0353 - - - -
0.2559 400 1.7831 - - - -
0.3199 500 1.8887 1.7744 0.8662 0.4924 0.9379
0.3839 600 1.7814 - - - -
0.4479 700 1.7775 - - - -
0.4798 750 - 1.6468 0.8766 0.4945 0.9399
0.5118 800 1.6835 - - - -
0.5758 900 1.6974 - - - -
0.6398 1000 1.5704 1.4925 0.8895 0.5283 0.9460
0.7038 1100 1.6771 - - - -
0.7678 1200 1.619 - - - -
0.7997 1250 - 1.4311 0.8982 0.5252 0.9466
0.8317 1300 1.6119 - - - -
0.8957 1400 1.6043 - - - -
0.9597 1500 1.6848 1.4070 0.8988 0.5483 0.9511

Framework Versions

  • Python: 3.9.18
  • Sentence Transformers: 3.4.1
  • Transformers: 4.44.2
  • PyTorch: 2.2.1+cu121
  • Accelerate: 1.3.0
  • Datasets: 2.19.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}