metadata
base_model: sileod/deberta-v3-large-tasksource-nli
datasets:
- PiC/phrase_similarity
language:
- en
library_name: sentence-transformers
metrics:
- cosine_accuracy
- cosine_accuracy_threshold
- cosine_f1
- cosine_f1_threshold
- cosine_precision
- cosine_recall
- cosine_ap
- dot_accuracy
- dot_accuracy_threshold
- dot_f1
- dot_f1_threshold
- dot_precision
- dot_recall
- dot_ap
- manhattan_accuracy
- manhattan_accuracy_threshold
- manhattan_f1
- manhattan_f1_threshold
- manhattan_precision
- manhattan_recall
- manhattan_ap
- euclidean_accuracy
- euclidean_accuracy_threshold
- euclidean_f1
- euclidean_f1_threshold
- euclidean_precision
- euclidean_recall
- euclidean_ap
- max_accuracy
- max_accuracy_threshold
- max_f1
- max_f1_threshold
- max_precision
- max_recall
- max_ap
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:7004
- loss:SoftmaxLoss
widget:
- source_sentence: >-
The valve will open 100% when the set point is reached and will remain
open until a certain blow down factor is reached.
sentences:
- >-
Having raised $17,000,000 in a standard matter, one of the first
speculative IPOs, Tucker needed more money to continue development of
the car.
- >-
The valve will open 100% when the tennis scoring protocol is reached and
will remain open until a certain blow down factor is reached.
- >-
But the government of PML (N) gave it the complete exponential of a
Tehsil.
- source_sentence: >-
Java BluePrints was the first source to promote Model View Controller
(MVC) and Data Access Object (DAO) for Java EE application development.
sentences:
- >-
Java BluePrints was the pioneer authority to promote Model View
Controller (MVC) and Data Access Object (DAO) for Java EE application
development.
- >-
One of the primary job of IIUG is to publish news through a monthly
newsletter ("The Insider").
- >-
Opera Dragonfly must be downloaded on original practice, and functions
offline thereafter.
- source_sentence: It also appears immediately after the first shower of the monsoon.
sentences:
- >-
The latter can be minimised by meticulous precision to the wheel
bearings, tyre sizes and pressures, and brakes (to avoid parasitic brake
drag).
- It also appears immediately after the initial rain of the monsoon.
- >-
McCullough filed a second appeal that could not be denied without a
hearing from the State Attorney's Office.
- source_sentence: >-
This type places the shifters closer to the hand positions, but still
offer a simple reliable system, especially for touring cyclist.
sentences:
- >-
This type places the shifters closer to the palm placement, but still
offer a simple reliable system, especially for touring cyclist.
- >-
All square dancers learn standard "definitions" of calls, which they
recall and use when the caller issues a certain directive.
- >-
Mainos-TV operated by leasing atmospheric duration from Yleisradio,
broadcasting in reserved blocks between Yleisradio's own programming on
its two channels.
- source_sentence: >-
He also played with the Turkish 2nd Division team Pertevniyal, which was
at the time the farm team of Efes, via a dual license.
sentences:
- >-
The group is still active, producing a monthly action points on the
women, peace, and authentication blocks affecting countries on Council's
agenda.
- >-
Storage/centre tracks are found in the vicinity of the following
stations:
Other song highlights.
- >-
He also played with the Turkish 2nd Division team Pertevniyal, which was
at the time the farm team of Efes, via a two-part authorization.
model-index:
- name: SentenceTransformer based on sileod/deberta-v3-large-tasksource-nli
results:
- task:
type: binary-classification
name: Binary Classification
dataset:
name: quora duplicates dev
type: quora-duplicates-dev
metrics:
- type: cosine_accuracy
value: 0.753
name: Cosine Accuracy
- type: cosine_accuracy_threshold
value: 0.8562747240066528
name: Cosine Accuracy Threshold
- type: cosine_f1
value: 0.7734303912647863
name: Cosine F1
- type: cosine_f1_threshold
value: 0.827180027961731
name: Cosine F1 Threshold
- type: cosine_precision
value: 0.7095158597662772
name: Cosine Precision
- type: cosine_recall
value: 0.85
name: Cosine Recall
- type: cosine_ap
value: 0.7593865167351814
name: Cosine Ap
- type: dot_accuracy
value: 0.716
name: Dot Accuracy
- type: dot_accuracy_threshold
value: 472.6572265625
name: Dot Accuracy Threshold
- type: dot_f1
value: 0.7501982553528945
name: Dot F1
- type: dot_f1_threshold
value: 343.77313232421875
name: Dot F1 Threshold
- type: dot_precision
value: 0.621550591327201
name: Dot Precision
- type: dot_recall
value: 0.946
name: Dot Recall
- type: dot_ap
value: 0.6945003367753116
name: Dot Ap
- type: manhattan_accuracy
value: 0.754
name: Manhattan Accuracy
- type: manhattan_accuracy_threshold
value: 320.8356018066406
name: Manhattan Accuracy Threshold
- type: manhattan_f1
value: 0.7716105550500454
name: Manhattan F1
- type: manhattan_f1_threshold
value: 356.869140625
name: Manhattan F1 Threshold
- type: manhattan_precision
value: 0.7078464106844741
name: Manhattan Precision
- type: manhattan_recall
value: 0.848
name: Manhattan Recall
- type: manhattan_ap
value: 0.75919098072954
name: Manhattan Ap
- type: euclidean_accuracy
value: 0.751
name: Euclidean Accuracy
- type: euclidean_accuracy_threshold
value: 13.484582901000977
name: Euclidean Accuracy Threshold
- type: euclidean_f1
value: 0.7697777777777778
name: Euclidean F1
- type: euclidean_f1_threshold
value: 15.105815887451172
name: Euclidean F1 Threshold
- type: euclidean_precision
value: 0.6928
name: Euclidean Precision
- type: euclidean_recall
value: 0.866
name: Euclidean Recall
- type: euclidean_ap
value: 0.7572975810714628
name: Euclidean Ap
- type: max_accuracy
value: 0.754
name: Max Accuracy
- type: max_accuracy_threshold
value: 472.6572265625
name: Max Accuracy Threshold
- type: max_f1
value: 0.7734303912647863
name: Max F1
- type: max_f1_threshold
value: 356.869140625
name: Max F1 Threshold
- type: max_precision
value: 0.7095158597662772
name: Max Precision
- type: max_recall
value: 0.946
name: Max Recall
- type: max_ap
value: 0.7593865167351814
name: Max Ap
SentenceTransformer based on sileod/deberta-v3-large-tasksource-nli
This is a sentence-transformers model finetuned from sileod/deberta-v3-large-tasksource-nli on the PiC/phrase_similarity dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sileod/deberta-v3-large-tasksource-nli
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 1024 tokens
- Similarity Function: Cosine Similarity
- Training Dataset:
- Language: en
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Deehan1866/finetuned-valloss-sileod-deberta-v3-large-tasksource-nli")
# Run inference
sentences = [
'He also played with the Turkish 2nd Division team Pertevniyal, which was at the time the farm team of Efes, via a dual license.',
'He also played with the Turkish 2nd Division team Pertevniyal, which was at the time the farm team of Efes, via a two-part authorization.',
'Storage/centre tracks are found in the vicinity of the following stations:\nOther song highlights.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Binary Classification
- Dataset:
quora-duplicates-dev
- Evaluated with
BinaryClassificationEvaluator
Metric | Value |
---|---|
cosine_accuracy | 0.753 |
cosine_accuracy_threshold | 0.8563 |
cosine_f1 | 0.7734 |
cosine_f1_threshold | 0.8272 |
cosine_precision | 0.7095 |
cosine_recall | 0.85 |
cosine_ap | 0.7594 |
dot_accuracy | 0.716 |
dot_accuracy_threshold | 472.6572 |
dot_f1 | 0.7502 |
dot_f1_threshold | 343.7731 |
dot_precision | 0.6216 |
dot_recall | 0.946 |
dot_ap | 0.6945 |
manhattan_accuracy | 0.754 |
manhattan_accuracy_threshold | 320.8356 |
manhattan_f1 | 0.7716 |
manhattan_f1_threshold | 356.8691 |
manhattan_precision | 0.7078 |
manhattan_recall | 0.848 |
manhattan_ap | 0.7592 |
euclidean_accuracy | 0.751 |
euclidean_accuracy_threshold | 13.4846 |
euclidean_f1 | 0.7698 |
euclidean_f1_threshold | 15.1058 |
euclidean_precision | 0.6928 |
euclidean_recall | 0.866 |
euclidean_ap | 0.7573 |
max_accuracy | 0.754 |
max_accuracy_threshold | 472.6572 |
max_f1 | 0.7734 |
max_f1_threshold | 356.8691 |
max_precision | 0.7095 |
max_recall | 0.946 |
max_ap | 0.7594 |
Training Details
Training Dataset
PiC/phrase_similarity
- Dataset: PiC/phrase_similarity at fc67ce7
- Size: 7,004 training samples
- Columns:
sentence1
,sentence2
, andlabel
- Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label type string string int details - min: 12 tokens
- mean: 25.5 tokens
- max: 57 tokens
- min: 12 tokens
- mean: 25.9 tokens
- max: 58 tokens
- 0: ~48.80%
- 1: ~51.20%
- Samples:
sentence1 sentence2 label newly formed camp is released from the membrane and diffuses across the intracellular space where it serves to activate pka.
recently made encampment is released from the membrane and diffuses across the intracellular space where it serves to activate pka.
0
According to one data, in 1910, on others – in 1915, the mansion became Natalya Dmitriyevna Shchuchkina's property.
According to a particular statistic, in 1910, on others – in 1915, the mansion became Natalya Dmitriyevna Shchuchkina's property.
1
Note that Fact 1 does not assume any particular structure on the set formula_65.
Note that Fact 1 does not assume any specific edifice on the set formula_65.
0
- Loss:
SoftmaxLoss
Evaluation Dataset
PiC/phrase_similarity
- Dataset: PiC/phrase_similarity at fc67ce7
- Size: 1,000 evaluation samples
- Columns:
sentence1
,sentence2
, andlabel
- Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label type string string int details - min: 10 tokens
- mean: 25.46 tokens
- max: 58 tokens
- min: 11 tokens
- mean: 25.84 tokens
- max: 59 tokens
- 0: ~50.00%
- 1: ~50.00%
- Samples:
sentence1 sentence2 label after theo's apparent death, she decides to leave first colony and ends up traveling with the apostles.
after theo's apparent death, she decides to leave original settlement and ends up traveling with the apostles.
0
The guard assigned to Vivian leaves her to prevent the robbery, allowing her to connect to the bank's network.
The guard assigned to Vivian leaves her to prevent the robbery, allowing her to connect to the bank's locations.
0
Two days later Louis XVI banished Necker by a "lettre de cachet" for his very public exchange of pamphlets.
Two days later Louis XVI banished Necker by a "lettre de cachet" for his very free forum of pamphlets.
0
- Loss:
SoftmaxLoss
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 16per_device_eval_batch_size
: 16learning_rate
: 2e-05num_train_epochs
: 100warmup_ratio
: 0.1load_best_model_at_end
: True
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 100max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falsebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | loss | quora-duplicates-dev_max_ap |
---|---|---|---|---|
0 | 0 | - | - | 0.6829 |
0.2283 | 100 | - | 0.6795 | 0.6829 |
0.4566 | 200 | - | 0.6664 | 0.6873 |
0.6849 | 300 | - | 0.6426 | 0.7011 |
0.9132 | 400 | - | 0.5995 | 0.7190 |
1.1416 | 500 | 0.6452 | 0.5537 | 0.7410 |
1.3699 | 600 | - | 0.5262 | 0.7525 |
1.5982 | 700 | - | 0.5199 | 0.7594 |
1.8265 | 800 | - | 0.5206 | 0.7655 |
2.0548 | 900 | - | 0.5340 | 0.7745 |
2.2831 | 1000 | 0.4654 | 0.5433 | 0.7790 |
2.5114 | 1100 | - | 0.5683 | 0.7728 |
2.7397 | 1200 | - | 0.5629 | 0.7774 |
2.9680 | 1300 | - | 0.5715 | 0.7732 |
3.1963 | 1400 | - | 0.6772 | 0.7777 |
3.4247 | 1500 | 0.3219 | 0.6834 | 0.7844 |
3.6530 | 1600 | - | 0.7428 | 0.7792 |
3.8813 | 1700 | - | 0.7353 | 0.7594 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.10.10
- Sentence Transformers: 3.0.1
- Transformers: 4.42.3
- PyTorch: 2.2.1+cu121
- Accelerate: 0.32.1
- Datasets: 2.20.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers and SoftmaxLoss
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}