Edit model card

SentenceTransformer based on microsoft/deberta-v3-small

This is a sentence-transformers model finetuned from microsoft/deberta-v3-small on the stanfordnlp/snli dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/deberta-v3-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bobox/DeBERTaV3-small-SentenceTransformer-AdaptiveLayerBaseline")
# Run inference
sentences = [
    'people are standing near water with a boat heading their direction',
    'People are standing near water with a large blue boat heading their direction.',
    'The dogs are near the toy.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.766
spearman_cosine 0.7681
pearson_manhattan 0.7918
spearman_manhattan 0.7947
pearson_euclidean 0.7861
spearman_euclidean 0.7896
pearson_dot 0.6448
spearman_dot 0.6428
pearson_max 0.7918
spearman_max 0.7947

Binary Classification

Metric Value
cosine_accuracy 0.6731
cosine_accuracy_threshold 0.5815
cosine_f1 0.717
cosine_f1_threshold 0.4671
cosine_precision 0.5977
cosine_recall 0.8959
cosine_ap 0.7193
dot_accuracy 0.6445
dot_accuracy_threshold 71.9551
dot_f1 0.7094
dot_f1_threshold 53.7729
dot_precision 0.5779
dot_recall 0.9184
dot_ap 0.6828
manhattan_accuracy 0.6665
manhattan_accuracy_threshold 213.6252
manhattan_f1 0.7047
manhattan_f1_threshold 245.2058
manhattan_precision 0.5908
manhattan_recall 0.8729
manhattan_ap 0.7132
euclidean_accuracy 0.6621
euclidean_accuracy_threshold 10.3589
euclidean_f1 0.7024
euclidean_f1_threshold 12.0109
euclidean_precision 0.5865
euclidean_recall 0.8754
euclidean_ap 0.7102
max_accuracy 0.6731
max_accuracy_threshold 213.6252
max_f1 0.717
max_f1_threshold 245.2058
max_precision 0.5977
max_recall 0.9184
max_ap 0.7193

Training Details

Training Dataset

stanfordnlp/snli

  • Dataset: stanfordnlp/snli at cdb5c3d
  • Size: 314,315 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 5 tokens
    • mean: 16.62 tokens
    • max: 62 tokens
    • min: 4 tokens
    • mean: 9.46 tokens
    • max: 29 tokens
    • 0: 100.00%
  • Samples:
    sentence1 sentence2 label
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse. 0
    Children smiling and waving at camera There are children present 0
    A boy is jumping on skateboard in the middle of a red bridge. The boy does a skateboarding trick. 0
  • Loss: AdaptiveLayerLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "n_layers_per_step": 1,
        "last_layer_weight": 1,
        "prior_layers_weight": 1,
        "kl_div_weight": 1.2,
        "kl_temperature": 1.2
    }
    

Evaluation Dataset

sentence-transformers/stsb

  • Dataset: sentence-transformers/stsb at ab7a5ac
  • Size: 1,500 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 5 tokens
    • mean: 14.77 tokens
    • max: 45 tokens
    • min: 6 tokens
    • mean: 14.74 tokens
    • max: 49 tokens
    • min: 0.0
    • mean: 0.47
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A man with a hard hat is dancing. A man wearing a hard hat is dancing. 1.0
    A young child is riding a horse. A child is riding a horse. 0.95
    A man is feeding a mouse to a snake. The man is feeding a mouse to the snake. 1.0
  • Loss: AdaptiveLayerLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "n_layers_per_step": 1,
        "last_layer_weight": 1,
        "prior_layers_weight": 1,
        "kl_div_weight": 1.2,
        "kl_temperature": 1.2
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • learning_rate: 5e-06
  • weight_decay: 1e-07
  • num_train_epochs: 2
  • warmup_ratio: 0.5
  • save_safetensors: False
  • fp16: True
  • push_to_hub: True
  • hub_model_id: bobox/DeBERTaV3-small-SentenceTransformer-AdaptiveLayerBaselinen
  • hub_strategy: checkpoint
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-06
  • weight_decay: 1e-07
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.5
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: bobox/DeBERTaV3-small-SentenceTransformer-AdaptiveLayerBaselinen
  • hub_strategy: checkpoint
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss max_ap spearman_cosine
None 0 - 4.1425 - 0.4276
0.1001 983 4.7699 3.8387 0.6364 -
0.2001 1966 3.5997 2.7649 0.6722 -
0.3002 2949 2.811 2.3520 0.6838 -
0.4003 3932 2.414 2.0700 0.6883 -
0.5004 4915 2.186 1.8993 0.6913 -
0.6004 5898 1.8523 1.5632 0.7045 -
0.7005 6881 0.6415 1.4902 0.7082 -
0.8006 7864 0.5016 1.4636 0.7108 -
0.9006 8847 0.4194 1.3875 0.7121 -
1.0007 9830 0.3737 1.3077 0.7117 -
1.1008 10813 1.8087 1.0903 0.7172 -
1.2009 11796 1.6631 1.0388 0.7180 -
1.3009 12779 1.6161 1.0177 0.7169 -
1.4010 13762 1.5378 1.0136 0.7148 -
1.5011 14745 1.5215 1.0053 0.7159 -
1.6011 15728 1.2887 0.9600 0.7166 -
1.7012 16711 0.3058 0.9949 0.7180 -
1.8013 17694 0.2897 0.9792 0.7186 -
1.9014 18677 0.275 0.9598 0.7192 -
2.0 19646 - 0.9796 0.7193 -
None 0 - 2.4594 0.7193 0.7681

Framework Versions

  • Python: 3.10.13
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.1.2
  • Accelerate: 0.30.1
  • Datasets: 2.19.2
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

AdaptiveLayerLoss

@misc{li20242d,
    title={2D Matryoshka Sentence Embeddings}, 
    author={Xianming Li and Zongxi Li and Jing Li and Haoran Xie and Qing Li},
    year={2024},
    eprint={2402.14776},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
2
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Datasets used to train bobox/DeBERTaV3-small-SentenceTransformer-AdaptiveLayerBaseline

Collection including bobox/DeBERTaV3-small-SentenceTransformer-AdaptiveLayerBaseline

Evaluation results