Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

SentenceTransformer based on microsoft/deberta-v3-xsmall

This is a sentence-transformers model finetuned from microsoft/deberta-v3-xsmall on the stanfordnlp/snli dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/deberta-v3-xsmall
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bobox/DeBERTaV3-xSmall-SentenceTransformer-0.03")
# Run inference
sentences = [
    'in each square',
    'It is widespread.',
    'A young girl flips an omelet.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.7972
spearman_cosine 0.807
pearson_manhattan 0.8079
spearman_manhattan 0.8072
pearson_euclidean 0.8084
spearman_euclidean 0.8073
pearson_dot 0.7029
spearman_dot 0.6909
pearson_max 0.8084
spearman_max 0.8073

Binary Classification

Metric Value
cosine_accuracy 0.6772
cosine_accuracy_threshold 0.7285
cosine_f1 0.7187
cosine_f1_threshold 0.6111
cosine_precision 0.611
cosine_recall 0.8724
cosine_ap 0.7392
dot_accuracy 0.6383
dot_accuracy_threshold 228.4041
dot_f1 0.7068
dot_f1_threshold 177.3942
dot_precision 0.5811
dot_recall 0.9017
dot_ap 0.6904
manhattan_accuracy 0.6635
manhattan_accuracy_threshold 174.6275
manhattan_f1 0.7054
manhattan_f1_threshold 232.6788
manhattan_precision 0.5772
manhattan_recall 0.907
manhattan_ap 0.7282
euclidean_accuracy 0.6651
euclidean_accuracy_threshold 13.4225
euclidean_f1 0.7068
euclidean_f1_threshold 17.6348
euclidean_precision 0.5756
euclidean_recall 0.9154
euclidean_ap 0.7303
max_accuracy 0.6772
max_accuracy_threshold 228.4041
max_f1 0.7187
max_f1_threshold 232.6788
max_precision 0.611
max_recall 0.9154
max_ap 0.7392

Training Details

Training Dataset

stanfordnlp/snli

  • Dataset: stanfordnlp/snli at cdb5c3d
  • Size: 314,315 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 5 tokens
    • mean: 16.62 tokens
    • max: 62 tokens
    • min: 4 tokens
    • mean: 9.46 tokens
    • max: 29 tokens
    • 0: 100.00%
  • Samples:
    sentence1 sentence2 label
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse. 0
    Children smiling and waving at camera There are children present 0
    A boy is jumping on skateboard in the middle of a red bridge. The boy does a skateboarding trick. 0
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

sentence-transformers/stsb

  • Dataset: sentence-transformers/stsb at ab7a5ac
  • Size: 1,500 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 5 tokens
    • mean: 14.77 tokens
    • max: 45 tokens
    • min: 6 tokens
    • mean: 14.74 tokens
    • max: 49 tokens
    • min: 0.0
    • mean: 0.47
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A man with a hard hat is dancing. A man wearing a hard hat is dancing. 1.0
    A young child is riding a horse. A child is riding a horse. 0.95
    A man is feeding a mouse to a snake. The man is feeding a mouse to the snake. 1.0
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • learning_rate: 7.5e-05
  • num_train_epochs: 2
  • warmup_ratio: 0.25
  • save_safetensors: False
  • fp16: True
  • push_to_hub: True
  • hub_model_id: bobox/DeBERTaV3-xSmall-SentenceTransformer-0.03n
  • hub_strategy: checkpoint

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 7.5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.25
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: bobox/DeBERTaV3-xSmall-SentenceTransformer-0.03n
  • hub_strategy: checkpoint
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss max_ap sts-dev_spearman_cosine
None 0 - 3.7624 0.5721 0.4168
0.0501 246 3.3825 - - -
0.1002 492 1.8307 - - -
0.1500 737 - 1.0084 0.7024 -
0.1502 738 1.055 - - -
0.2003 984 0.7961 - - -
0.2504 1230 0.6859 - - -
0.3001 1474 - 0.7410 0.7191 -
0.3005 1476 0.5914 - - -
0.3506 1722 0.5324 - - -
0.4007 1968 0.5077 - - -
0.4501 2211 - 0.6152 0.7144 -
0.4507 2214 0.4647 - - -
0.5008 2460 0.4443 - - -
0.5509 2706 0.4169 - - -
0.6002 2948 - 0.5820 0.7207 -
0.6010 2952 0.3831 - - -
0.6511 3198 0.393 - - -
0.7011 3444 0.3654 - - -
0.7502 3685 - 0.5284 0.7264 -
0.7512 3690 0.344 - - -
0.8013 3936 0.3336 - - -
0.8514 4182 0.3382 - - -
0.9002 4422 - 0.4911 0.7294 -
0.9015 4428 0.3182 - - -
0.9515 4674 0.3213 - - -
1.0016 4920 0.3032 - - -
1.0503 5159 - 0.4777 0.7325 -
1.0517 5166 0.2526 - - -
1.1018 5412 0.2652 - - -
1.1519 5658 0.2538 - - -
1.2003 5896 - 0.4569 0.7331 -
1.2020 5904 0.2454 - - -
1.2520 6150 0.2528 - - -
1.3021 6396 0.2448 - - -
1.3504 6633 - 0.4334 0.7370 -
1.3522 6642 0.2282 - - -
1.4023 6888 0.2295 - - -
1.4524 7134 0.2313 - - -
1.5004 7370 - 0.4237 0.7342 -
1.5024 7380 0.2218 - - -
1.5525 7626 0.2246 - - -
1.6026 7872 0.218 - - -
1.6504 8107 - 0.4102 0.7388 -
1.6527 8118 0.2095 - - -
1.7028 8364 0.2114 - - -
1.7529 8610 0.2063 - - -
1.8005 8844 - 0.4075 0.7370 -
1.8029 8856 0.1968 - - -
1.8530 9102 0.2061 - - -
1.9031 9348 0.2089 - - -
1.9505 9581 - 0.3978 0.7395 -
1.9532 9594 0.2005 - - -
2.0 9824 - 0.3963 0.7392 -
None 0 - 1.5506 - 0.8070

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.0
  • Transformers: 4.41.2
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.30.1
  • Datasets: 2.19.2
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
13

Finetuned from

Evaluation results