SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-MiniLM-L6-v2
Maximum Sequence Length: 256 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("GozdeA/tennis-multi-return-v4")
# Run inference
sentences = [
    'Show me previous game result',
    'what venue',
    'How is the tactical battle between the player and Amanda Anismova playing out?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.6952, -0.0128],
#         [ 0.6952,  1.0000,  0.0505],
#         [-0.0128,  0.0505,  1.0000]])

Training Details

Training Dataset

Unnamed Dataset

Size: 11,600 training samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 4 tokens mean: 10.75 tokens max: 26 tokens	min: 4 tokens mean: 8.66 tokens max: 23 tokens	min: 4 tokens mean: 10.45 tokens max: 23 tokens

Samples:

anchor	positive	negative
`What is the this season for Djokovic?`	`What's the this season for Djokovic?`	`What is the attacking this set for Djokovic?`
`who is projected?`	`momentum shift?`	`How does she's path to this round compare to Amanda Anismova's?`
`What's the sets won for Sinner?`	`Show me how many winners`	`What's the last year for Djokovic?`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Evaluation Dataset

Unnamed Dataset

Size: 2,900 evaluation samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 4 tokens mean: 10.63 tokens max: 28 tokens	min: 4 tokens mean: 8.64 tokens max: 26 tokens	min: 4 tokens mean: 10.24 tokens max: 24 tokens

Samples:

anchor	positive	negative
`What about Djokovic's games?`	`What's the how many winners for Djokovic?`	`ranking for the player?`
`What is the next match for Djokovic?`	`What are the next match for Djokovic?`	`What is the pre match for Djokovic?`
`What are the gaining momentum for Sinner?`	`What is the gaining momentum for Sinner?`	`What are the gaining control for Sinner?`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 16
learning_rate: 2e-05
num_train_epochs: 15
warmup_ratio: 0.1
fp16: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: no
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 8
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 15
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
project: huggingface
trackio_space_id: trackio
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: no
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Click to expand

Epoch	Step	Training Loss
0.0690	50	5.1095
0.1379	100	3.9909
0.2069	150	3.1963
0.2759	200	2.3301
0.3448	250	1.9904
0.4138	300	1.6705
0.4828	350	1.5659
0.5517	400	1.5497
0.6207	450	1.3563
0.6897	500	1.2982
0.7586	550	1.2509
0.8276	600	1.1737
0.8966	650	1.1486
0.9655	700	1.192
1.0345	750	0.9715
1.1034	800	1.0054
1.1724	850	1.0102
1.2414	900	0.9393
1.3103	950	0.9119
1.3793	1000	0.8589
1.4483	1050	0.9049
1.5172	1100	0.8774
1.5862	1150	0.8488
1.6552	1200	0.8382
1.7241	1250	0.7437
1.7931	1300	0.8023
1.8621	1350	0.7775
1.9310	1400	0.7756
2.0	1450	0.7273
2.0690	1500	0.6275
2.1379	1550	0.7331
2.2069	1600	0.629
2.2759	1650	0.7127
2.3448	1700	0.6503
2.4138	1750	0.7082
2.4828	1800	0.6939
2.5517	1850	0.6993
2.6207	1900	0.7067
2.6897	1950	0.6622
2.7586	2000	0.6499
2.8276	2050	0.6923
2.8966	2100	0.6208
2.9655	2150	0.5925
3.0345	2200	0.6697
3.1034	2250	0.6458
3.1724	2300	0.5709
3.2414	2350	0.5987
3.3103	2400	0.6252
3.3793	2450	0.6377
3.4483	2500	0.5739
3.5172	2550	0.6281
3.5862	2600	0.6186
3.6552	2650	0.5828
3.7241	2700	0.678
3.7931	2750	0.6257
3.8621	2800	0.5704
3.9310	2850	0.6151
4.0	2900	0.5898
4.0690	2950	0.5277
4.1379	3000	0.6128
4.2069	3050	0.6306
4.2759	3100	0.5739
4.3448	3150	0.5396
4.4138	3200	0.617
4.4828	3250	0.5119
4.5517	3300	0.6136
4.6207	3350	0.6303
4.6897	3400	0.6138
4.7586	3450	0.6214
4.8276	3500	0.5686
4.8966	3550	0.5901
4.9655	3600	0.6913
5.0345	3650	0.5706
5.1034	3700	0.6082
5.1724	3750	0.4755
5.2414	3800	0.5777
5.3103	3850	0.5515
5.3793	3900	0.5271
5.4483	3950	0.5816
5.5172	4000	0.5787
5.5862	4050	0.568
5.6552	4100	0.5593
5.7241	4150	0.542
5.7931	4200	0.5873
5.8621	4250	0.5647
5.9310	4300	0.6369
6.0	4350	0.5775
6.0690	4400	0.5324
6.1379	4450	0.5463
6.2069	4500	0.5234
6.2759	4550	0.4921
6.3448	4600	0.5716
6.4138	4650	0.6321
6.4828	4700	0.4881
6.5517	4750	0.5717
6.6207	4800	0.5922
6.6897	4850	0.5289
6.7586	4900	0.5182
6.8276	4950	0.5096
6.8966	5000	0.6062
6.9655	5050	0.6014
7.0345	5100	0.5033
7.1034	5150	0.4994
7.1724	5200	0.5842
7.2414	5250	0.5317
7.3103	5300	0.5112
7.3793	5350	0.5188
7.4483	5400	0.6174
7.5172	5450	0.484
7.5862	5500	0.5571
7.6552	5550	0.5043
7.7241	5600	0.5341
7.7931	5650	0.5225
7.8621	5700	0.5618
7.9310	5750	0.5537
8.0	5800	0.5811
8.0690	5850	0.5311
8.1379	5900	0.5585
8.2069	5950	0.5564
8.2759	6000	0.5469
8.3448	6050	0.5726
8.4138	6100	0.5329
8.4828	6150	0.55
8.5517	6200	0.5365
8.6207	6250	0.5847
8.6897	6300	0.5204
8.7586	6350	0.5112
8.8276	6400	0.5468
8.8966	6450	0.4871
8.9655	6500	0.5449
9.0345	6550	0.5237
9.1034	6600	0.5232
9.1724	6650	0.5075
9.2414	6700	0.5078
9.3103	6750	0.5366
9.3793	6800	0.5636
9.4483	6850	0.4743
9.5172	6900	0.4776
9.5862	6950	0.5571
9.6552	7000	0.56
9.7241	7050	0.5054
9.7931	7100	0.5431
9.8621	7150	0.5358
9.9310	7200	0.5395
10.0	7250	0.5394
10.0690	7300	0.57
10.1379	7350	0.4883
10.2069	7400	0.4884
10.2759	7450	0.4587
10.3448	7500	0.5076
10.4138	7550	0.5108
10.4828	7600	0.565
10.5517	7650	0.503
10.6207	7700	0.5645
10.6897	7750	0.509
10.7586	7800	0.4993
10.8276	7850	0.5464
10.8966	7900	0.5293
10.9655	7950	0.5384
11.0345	8000	0.5245
11.1034	8050	0.4647
11.1724	8100	0.4983
11.2414	8150	0.5168
11.3103	8200	0.5455
11.3793	8250	0.5069
11.4483	8300	0.5523
11.5172	8350	0.4875
11.5862	8400	0.4947
11.6552	8450	0.5022
11.7241	8500	0.5096
11.7931	8550	0.5768
11.8621	8600	0.5187
11.9310	8650	0.4883
12.0	8700	0.5039
12.0690	8750	0.527
12.1379	8800	0.5382
12.2069	8850	0.4912
12.2759	8900	0.5144
12.3448	8950	0.532
12.4138	9000	0.5233
12.4828	9050	0.4169
12.5517	9100	0.5278
12.6207	9150	0.5028
12.6897	9200	0.5227
12.7586	9250	0.4812
12.8276	9300	0.5299
12.8966	9350	0.5383
12.9655	9400	0.5245
13.0345	9450	0.5045
13.1034	9500	0.5619
13.1724	9550	0.4969
13.2414	9600	0.508
13.3103	9650	0.5095
13.3793	9700	0.5095
13.4483	9750	0.4886
13.5172	9800	0.5074
13.5862	9850	0.4761
13.6552	9900	0.4805
13.7241	9950	0.4559
13.7931	10000	0.5212
13.8621	10050	0.506
13.9310	10100	0.5086
14.0	10150	0.5232
14.0690	10200	0.5156
14.1379	10250	0.495
14.2069	10300	0.5226
14.2759	10350	0.4842
14.3448	10400	0.4514
14.4138	10450	0.4902
14.4828	10500	0.5068
14.5517	10550	0.5784
14.6207	10600	0.5646
14.6897	10650	0.4994
14.7586	10700	0.552
14.8276	10750	0.5216
14.8966	10800	0.5506
14.9655	10850	0.4286

Framework Versions

Python: 3.12.12
Sentence Transformers: 5.0.0
Transformers: 4.57.6
PyTorch: 2.10.0+cu128
Accelerate: 1.13.0
Datasets: 4.0.0
Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}