SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-MiniLM-L6-v2
Maximum Sequence Length: 256 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("GozdeA/tennis-multi-return-mlp-v2")
# Run inference
sentences = [
    'What is the can he win for Djokovic?',
    'form shift?',
    'What is the set time for the player?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6634, 0.0789],
#         [0.6634, 1.0000, 0.1159],
#         [0.0789, 0.1159, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

Size: 11,600 training samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 4 tokens mean: 10.77 tokens max: 28 tokens	min: 4 tokens mean: 8.81 tokens max: 28 tokens	min: 4 tokens mean: 10.67 tokens max: 28 tokens

Samples:

anchor	positive	negative
`What is the overall return for Djokovic?`	`overall for Djokovic?`	`What is the return winners for Djokovic?`
`What is the return winner count for Alcaraz and Fritz?`	`how many winners?`	`What is the how good is his return for Sinner?`
`backhand for he?`	`What is the backhand quality for he?`	`What is the backhand today for he?`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Evaluation Dataset

Unnamed Dataset

Size: 2,900 evaluation samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 4 tokens mean: 10.9 tokens max: 28 tokens	min: 4 tokens mean: 8.62 tokens max: 26 tokens	min: 4 tokens mean: 10.38 tokens max: 25 tokens

Samples:

anchor	positive	negative
`How does Shelton's game match up against Lorenzo Sonego's strengths?`	`key factors?`	`What is the date of birth for Djokovic?`
`What is the what are the key for Sinner?`	`What's the what are the key for Sinner?`	`What are the what is a for Sinner?`
`professional career stats?`	`professional career titles?`	`How does Shelton's forehand compare to their career average?`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 16
learning_rate: 2e-05
num_train_epochs: 15
warmup_ratio: 0.1
fp16: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: no
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 8
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 15
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
project: huggingface
trackio_space_id: trackio
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: no
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Click to expand

Epoch	Step	Training Loss
0.0690	50	4.8633
0.1379	100	4.2929
0.2069	150	3.2473
0.2759	200	2.4133
0.3448	250	2.0601
0.4138	300	1.7225
0.4828	350	1.631
0.5517	400	1.5036
0.6207	450	1.3556
0.6897	500	1.2699
0.7586	550	1.3131
0.8276	600	1.1743
0.8966	650	1.0491
0.9655	700	1.2265
1.0345	750	1.0786
1.1034	800	1.0451
1.1724	850	1.0379
1.2414	900	0.9378
1.3103	950	0.8659
1.3793	1000	0.8908
1.4483	1050	0.8333
1.5172	1100	0.7814
1.5862	1150	0.7764
1.6552	1200	0.8071
1.7241	1250	0.7394
1.7931	1300	0.7137
1.8621	1350	0.7669
1.9310	1400	0.6652
2.0	1450	0.7612
2.0690	1500	0.6847
2.1379	1550	0.6511
2.2069	1600	0.7297
2.2759	1650	0.6836
2.3448	1700	0.6733
2.4138	1750	0.6125
2.4828	1800	0.664
2.5517	1850	0.6212
2.6207	1900	0.6613
2.6897	1950	0.645
2.7586	2000	0.6311
2.8276	2050	0.6823
2.8966	2100	0.6608
2.9655	2150	0.6408
3.0345	2200	0.6364
3.1034	2250	0.5752
3.1724	2300	0.6431
3.2414	2350	0.585
3.3103	2400	0.6852
3.3793	2450	0.6743
3.4483	2500	0.5907
3.5172	2550	0.5632
3.5862	2600	0.5853
3.6552	2650	0.5906
3.7241	2700	0.6471
3.7931	2750	0.5809
3.8621	2800	0.5832
3.9310	2850	0.6011
4.0	2900	0.5926
4.0690	2950	0.5962
4.1379	3000	0.6648
4.2069	3050	0.5759
4.2759	3100	0.5162
4.3448	3150	0.5945
4.4138	3200	0.5859
4.4828	3250	0.6066
4.5517	3300	0.5536
4.6207	3350	0.5112
4.6897	3400	0.5094
4.7586	3450	0.5056
4.8276	3500	0.573
4.8966	3550	0.5425
4.9655	3600	0.5641
5.0345	3650	0.5409
5.1034	3700	0.58
5.1724	3750	0.5669
5.2414	3800	0.6087
5.3103	3850	0.557
5.3793	3900	0.5191
5.4483	3950	0.5321
5.5172	4000	0.5965
5.5862	4050	0.5612
5.6552	4100	0.6181
5.7241	4150	0.5144
5.7931	4200	0.5187
5.8621	4250	0.5362
5.9310	4300	0.5215
6.0	4350	0.5578
6.0690	4400	0.5291
6.1379	4450	0.512
6.2069	4500	0.5702
6.2759	4550	0.5935
6.3448	4600	0.5376
6.4138	4650	0.5012
6.4828	4700	0.6246
6.5517	4750	0.5038
6.6207	4800	0.5739
6.6897	4850	0.5765
6.7586	4900	0.58
6.8276	4950	0.5462
6.8966	5000	0.5087
6.9655	5050	0.5357
7.0345	5100	0.5352
7.1034	5150	0.5002
7.1724	5200	0.5196
7.2414	5250	0.5668
7.3103	5300	0.5104
7.3793	5350	0.5029
7.4483	5400	0.481
7.5172	5450	0.5567
7.5862	5500	0.5425
7.6552	5550	0.4884
7.7241	5600	0.4854
7.7931	5650	0.5459
7.8621	5700	0.5201
7.9310	5750	0.5288
8.0	5800	0.5055
8.0690	5850	0.4656
8.1379	5900	0.5538
8.2069	5950	0.5513
8.2759	6000	0.5078
8.3448	6050	0.508
8.4138	6100	0.5403
8.4828	6150	0.4711
8.5517	6200	0.5024
8.6207	6250	0.4886
8.6897	6300	0.5446
8.7586	6350	0.4953
8.8276	6400	0.5395
8.8966	6450	0.571
8.9655	6500	0.567
9.0345	6550	0.5684
9.1034	6600	0.543
9.1724	6650	0.5449
9.2414	6700	0.4713
9.3103	6750	0.5046
9.3793	6800	0.5785
9.4483	6850	0.4744
9.5172	6900	0.5364
9.5862	6950	0.5523
9.6552	7000	0.5245
9.7241	7050	0.5005
9.7931	7100	0.5355
9.8621	7150	0.5248
9.9310	7200	0.4924
10.0	7250	0.4885
10.0690	7300	0.4708
10.1379	7350	0.5075
10.2069	7400	0.4943
10.2759	7450	0.4926
10.3448	7500	0.4757
10.4138	7550	0.5305
10.4828	7600	0.4626
10.5517	7650	0.5161
10.6207	7700	0.48
10.6897	7750	0.466
10.7586	7800	0.5556
10.8276	7850	0.51
10.8966	7900	0.5185
10.9655	7950	0.5485
11.0345	8000	0.4591
11.1034	8050	0.523
11.1724	8100	0.5295
11.2414	8150	0.4482
11.3103	8200	0.5275
11.3793	8250	0.4849
11.4483	8300	0.5374
11.5172	8350	0.4621
11.5862	8400	0.4374
11.6552	8450	0.4855
11.7241	8500	0.5147
11.7931	8550	0.564
11.8621	8600	0.4763
11.9310	8650	0.4456
12.0	8700	0.4906
12.0690	8750	0.4912
12.1379	8800	0.4556
12.2069	8850	0.4936
12.2759	8900	0.4864
12.3448	8950	0.5262
12.4138	9000	0.458
12.4828	9050	0.5631
12.5517	9100	0.5144
12.6207	9150	0.4966
12.6897	9200	0.5589
12.7586	9250	0.4718
12.8276	9300	0.5124
12.8966	9350	0.5362
12.9655	9400	0.482
13.0345	9450	0.4821
13.1034	9500	0.4984
13.1724	9550	0.4646
13.2414	9600	0.4825
13.3103	9650	0.4957
13.3793	9700	0.4739
13.4483	9750	0.523
13.5172	9800	0.4892
13.5862	9850	0.4803
13.6552	9900	0.502
13.7241	9950	0.4828
13.7931	10000	0.5034
13.8621	10050	0.5151
13.9310	10100	0.5292
14.0	10150	0.5227
14.0690	10200	0.4853
14.1379	10250	0.4528
14.2069	10300	0.4591
14.2759	10350	0.4482
14.3448	10400	0.4412
14.4138	10450	0.4854
14.4828	10500	0.4734
14.5517	10550	0.4749
14.6207	10600	0.5448
14.6897	10650	0.5117
14.7586	10700	0.4776
14.8276	10750	0.4638
14.8966	10800	0.5636
14.9655	10850	0.547

Framework Versions

Python: 3.12.12
Sentence Transformers: 5.0.0
Transformers: 4.57.6
PyTorch: 2.10.0+cu128
Accelerate: 1.13.0
Datasets: 4.0.0
Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}