SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-MiniLM-L6-v2
Maximum Sequence Length: 256 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("GozdeA/tennis-multi-return-knn-v3")
# Run inference
sentences = [
    'What is the break point conversion for Sinner?',
    'Show me how many winners',
    'service for Sinner?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5908, 0.2891],
#         [0.5908, 1.0000, 0.4187],
#         [0.2891, 0.4187, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

Size: 11,641 training samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 4 tokens mean: 10.77 tokens max: 26 tokens	min: 4 tokens mean: 8.62 tokens max: 26 tokens	min: 4 tokens mean: 10.51 tokens max: 26 tokens

Samples:

anchor	positive	negative
`What about he's odds?`	`momentum shift?`	`What happened to he?`
`How far has Nardi advanced at Wimbledon in his best run?`	`how many titles?`	`What is the what court for he?`
`How effective is Swiatek's return in the match?`	`How effective is he's return in the match?`	`How effective is his return in the game`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Evaluation Dataset

Unnamed Dataset

Size: 2,911 evaluation samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 4 tokens mean: 11.06 tokens max: 26 tokens	min: 4 tokens mean: 8.7 tokens max: 21 tokens	min: 4 tokens mean: 10.53 tokens max: 28 tokens

Samples:

anchor	positive	negative
`what venue`	`Show me what venue`	`venue time?`
`2025 for he?`	`how many titles?`	`Show me which court`
`What about Djokovic's debut?`	`What about he's debut?`	`What about Djokovic's momentum?`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 16
learning_rate: 2e-05
num_train_epochs: 15
warmup_ratio: 0.1
fp16: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: no
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 8
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 15
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
project: huggingface
trackio_space_id: trackio
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: no
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Click to expand

Epoch	Step	Training Loss
0.0687	50	5.002
0.1374	100	4.122
0.2060	150	3.3282
0.2747	200	2.5309
0.3434	250	1.9021
0.4121	300	1.7012
0.4808	350	1.4657
0.5495	400	1.433
0.6181	450	1.5156
0.6868	500	1.3941
0.7555	550	1.2544
0.8242	600	1.1585
0.8929	650	1.0916
0.9615	700	0.9743
1.0302	750	1.0443
1.0989	800	0.9942
1.1676	850	1.0508
1.2363	900	0.9211
1.3049	950	0.9522
1.3736	1000	0.804
1.4423	1050	0.8645
1.5110	1100	0.8335
1.5797	1150	0.7337
1.6484	1200	0.7857
1.7170	1250	0.8482
1.7857	1300	0.7211
1.8544	1350	0.7442
1.9231	1400	0.7557
1.9918	1450	0.7323
2.0604	1500	0.677
2.1291	1550	0.6635
2.1978	1600	0.71
2.2665	1650	0.6193
2.3352	1700	0.6792
2.4038	1750	0.7151
2.4725	1800	0.6825
2.5412	1850	0.6452
2.6099	1900	0.666
2.6786	1950	0.5733
2.7473	2000	0.5546
2.8159	2050	0.6443
2.8846	2100	0.6835
2.9533	2150	0.6499
3.0220	2200	0.6229
3.0907	2250	0.6151
3.1593	2300	0.539
3.2280	2350	0.5997
3.2967	2400	0.571
3.3654	2450	0.6257
3.4341	2500	0.6222
3.5027	2550	0.6102
3.5714	2600	0.6575
3.6401	2650	0.5844
3.7088	2700	0.5439
3.7775	2750	0.5528
3.8462	2800	0.5894
3.9148	2850	0.6576
3.9835	2900	0.6063
4.0522	2950	0.5556
4.1209	3000	0.5872
4.1896	3050	0.544
4.2582	3100	0.5114
4.3269	3150	0.587
4.3956	3200	0.5392
4.4643	3250	0.5846
4.5330	3300	0.6077
4.6016	3350	0.6597
4.6703	3400	0.5425
4.7390	3450	0.5493
4.8077	3500	0.5291
4.8764	3550	0.5145
4.9451	3600	0.5534
5.0137	3650	0.5018
5.0824	3700	0.4948
5.1511	3750	0.553
5.2198	3800	0.5772
5.2885	3850	0.5264
5.3571	3900	0.5516
5.4258	3950	0.5303
5.4945	4000	0.5213
5.5632	4050	0.5558
5.6319	4100	0.4956
5.7005	4150	0.6035
5.7692	4200	0.5706
5.8379	4250	0.4922
5.9066	4300	0.5965
5.9753	4350	0.5143
6.0440	4400	0.5798
6.1126	4450	0.5219
6.1813	4500	0.5803
6.25	4550	0.5035
6.3187	4600	0.5534
6.3874	4650	0.546
6.4560	4700	0.525
6.5247	4750	0.4751
6.5934	4800	0.5085
6.6621	4850	0.5282
6.7308	4900	0.5845
6.7995	4950	0.5153
6.8681	5000	0.5399
6.9368	5050	0.5532
7.0055	5100	0.5005
7.0742	5150	0.5273
7.1429	5200	0.5212
7.2115	5250	0.5245
7.2802	5300	0.5075
7.3489	5350	0.5687
7.4176	5400	0.4674
7.4863	5450	0.5115
7.5549	5500	0.4938
7.6236	5550	0.5059
7.6923	5600	0.5065
7.7610	5650	0.5252
7.8297	5700	0.4852
7.8984	5750	0.48
7.9670	5800	0.5503
8.0357	5850	0.5164
8.1044	5900	0.5756
8.1731	5950	0.5175
8.2418	6000	0.5033
8.3104	6050	0.4992
8.3791	6100	0.5299
8.4478	6150	0.4862
8.5165	6200	0.548
8.5852	6250	0.454
8.6538	6300	0.4941
8.7225	6350	0.5088
8.7912	6400	0.5065
8.8599	6450	0.4921
8.9286	6500	0.4756
8.9973	6550	0.5258
9.0659	6600	0.4658
9.1346	6650	0.4894
9.2033	6700	0.5097
9.2720	6750	0.493
9.3407	6800	0.5311
9.4093	6850	0.5157
9.4780	6900	0.5142
9.5467	6950	0.4664
9.6154	7000	0.528
9.6841	7050	0.5645
9.7527	7100	0.5214
9.8214	7150	0.4777
9.8901	7200	0.5449
9.9588	7250	0.492
10.0275	7300	0.4591
10.0962	7350	0.4576
10.1648	7400	0.4692
10.2335	7450	0.5415
10.3022	7500	0.4803
10.3709	7550	0.5487
10.4396	7600	0.5706
10.5082	7650	0.4815
10.5769	7700	0.4585
10.6456	7750	0.4861
10.7143	7800	0.4247
10.7830	7850	0.4906
10.8516	7900	0.5371
10.9203	7950	0.5393
10.9890	8000	0.4788
11.0577	8050	0.5038
11.1264	8100	0.4838
11.1951	8150	0.515
11.2637	8200	0.5299
11.3324	8250	0.5044
11.4011	8300	0.5045
11.4698	8350	0.465
11.5385	8400	0.5253
11.6071	8450	0.4517
11.6758	8500	0.5048
11.7445	8550	0.4733
11.8132	8600	0.47
11.8819	8650	0.4552
11.9505	8700	0.4203
12.0192	8750	0.395
12.0879	8800	0.5411
12.1566	8850	0.4911
12.2253	8900	0.4641
12.2940	8950	0.4608
12.3626	9000	0.4839
12.4313	9050	0.4491
12.5	9100	0.517
12.5687	9150	0.5031
12.6374	9200	0.4869
12.7060	9250	0.4856
12.7747	9300	0.4754
12.8434	9350	0.5167
12.9121	9400	0.5004
12.9808	9450	0.5293
13.0495	9500	0.4566
13.1181	9550	0.477
13.1868	9600	0.4501
13.2555	9650	0.4791
13.3242	9700	0.4746
13.3929	9750	0.4702
13.4615	9800	0.469
13.5302	9850	0.5046
13.5989	9900	0.4895
13.6676	9950	0.5223
13.7363	10000	0.4245
13.8049	10050	0.4701
13.8736	10100	0.4548
13.9423	10150	0.4998
14.0110	10200	0.4345
14.0797	10250	0.4371
14.1484	10300	0.5009
14.2170	10350	0.4816
14.2857	10400	0.4665
14.3544	10450	0.5047
14.4231	10500	0.5132
14.4918	10550	0.473
14.5604	10600	0.4387
14.6291	10650	0.4775
14.6978	10700	0.4522
14.7665	10750	0.4807
14.8352	10800	0.482
14.9038	10850	0.4625
14.9725	10900	0.5052

Framework Versions

Python: 3.12.12
Sentence Transformers: 5.0.0
Transformers: 4.57.6
PyTorch: 2.10.0+cu128
Accelerate: 1.13.0
Datasets: 4.0.0
Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}