SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("GozdeA/tennis-multi-return-knn-v3")
# Run inference
sentences = [
    'What is the break point conversion for Sinner?',
    'Show me how many winners',
    'service for Sinner?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5908, 0.2891],
#         [0.5908, 1.0000, 0.4187],
#         [0.2891, 0.4187, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 11,641 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 4 tokens
    • mean: 10.77 tokens
    • max: 26 tokens
    • min: 4 tokens
    • mean: 8.62 tokens
    • max: 26 tokens
    • min: 4 tokens
    • mean: 10.51 tokens
    • max: 26 tokens
  • Samples:
    anchor positive negative
    What about he's odds? momentum shift? What happened to he?
    How far has Nardi advanced at Wimbledon in his best run? how many titles? What is the what court for he?
    How effective is Swiatek's return in the match? How effective is he's return in the match? How effective is his return in the game
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 2,911 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 4 tokens
    • mean: 11.06 tokens
    • max: 26 tokens
    • min: 4 tokens
    • mean: 8.7 tokens
    • max: 21 tokens
    • min: 4 tokens
    • mean: 10.53 tokens
    • max: 28 tokens
  • Samples:
    anchor positive negative
    what venue Show me what venue venue time?
    2025 for he? how many titles? Show me which court
    What about Djokovic's debut? What about he's debut? What about Djokovic's momentum?
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 15
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 15
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss
0.0687 50 5.002
0.1374 100 4.122
0.2060 150 3.3282
0.2747 200 2.5309
0.3434 250 1.9021
0.4121 300 1.7012
0.4808 350 1.4657
0.5495 400 1.433
0.6181 450 1.5156
0.6868 500 1.3941
0.7555 550 1.2544
0.8242 600 1.1585
0.8929 650 1.0916
0.9615 700 0.9743
1.0302 750 1.0443
1.0989 800 0.9942
1.1676 850 1.0508
1.2363 900 0.9211
1.3049 950 0.9522
1.3736 1000 0.804
1.4423 1050 0.8645
1.5110 1100 0.8335
1.5797 1150 0.7337
1.6484 1200 0.7857
1.7170 1250 0.8482
1.7857 1300 0.7211
1.8544 1350 0.7442
1.9231 1400 0.7557
1.9918 1450 0.7323
2.0604 1500 0.677
2.1291 1550 0.6635
2.1978 1600 0.71
2.2665 1650 0.6193
2.3352 1700 0.6792
2.4038 1750 0.7151
2.4725 1800 0.6825
2.5412 1850 0.6452
2.6099 1900 0.666
2.6786 1950 0.5733
2.7473 2000 0.5546
2.8159 2050 0.6443
2.8846 2100 0.6835
2.9533 2150 0.6499
3.0220 2200 0.6229
3.0907 2250 0.6151
3.1593 2300 0.539
3.2280 2350 0.5997
3.2967 2400 0.571
3.3654 2450 0.6257
3.4341 2500 0.6222
3.5027 2550 0.6102
3.5714 2600 0.6575
3.6401 2650 0.5844
3.7088 2700 0.5439
3.7775 2750 0.5528
3.8462 2800 0.5894
3.9148 2850 0.6576
3.9835 2900 0.6063
4.0522 2950 0.5556
4.1209 3000 0.5872
4.1896 3050 0.544
4.2582 3100 0.5114
4.3269 3150 0.587
4.3956 3200 0.5392
4.4643 3250 0.5846
4.5330 3300 0.6077
4.6016 3350 0.6597
4.6703 3400 0.5425
4.7390 3450 0.5493
4.8077 3500 0.5291
4.8764 3550 0.5145
4.9451 3600 0.5534
5.0137 3650 0.5018
5.0824 3700 0.4948
5.1511 3750 0.553
5.2198 3800 0.5772
5.2885 3850 0.5264
5.3571 3900 0.5516
5.4258 3950 0.5303
5.4945 4000 0.5213
5.5632 4050 0.5558
5.6319 4100 0.4956
5.7005 4150 0.6035
5.7692 4200 0.5706
5.8379 4250 0.4922
5.9066 4300 0.5965
5.9753 4350 0.5143
6.0440 4400 0.5798
6.1126 4450 0.5219
6.1813 4500 0.5803
6.25 4550 0.5035
6.3187 4600 0.5534
6.3874 4650 0.546
6.4560 4700 0.525
6.5247 4750 0.4751
6.5934 4800 0.5085
6.6621 4850 0.5282
6.7308 4900 0.5845
6.7995 4950 0.5153
6.8681 5000 0.5399
6.9368 5050 0.5532
7.0055 5100 0.5005
7.0742 5150 0.5273
7.1429 5200 0.5212
7.2115 5250 0.5245
7.2802 5300 0.5075
7.3489 5350 0.5687
7.4176 5400 0.4674
7.4863 5450 0.5115
7.5549 5500 0.4938
7.6236 5550 0.5059
7.6923 5600 0.5065
7.7610 5650 0.5252
7.8297 5700 0.4852
7.8984 5750 0.48
7.9670 5800 0.5503
8.0357 5850 0.5164
8.1044 5900 0.5756
8.1731 5950 0.5175
8.2418 6000 0.5033
8.3104 6050 0.4992
8.3791 6100 0.5299
8.4478 6150 0.4862
8.5165 6200 0.548
8.5852 6250 0.454
8.6538 6300 0.4941
8.7225 6350 0.5088
8.7912 6400 0.5065
8.8599 6450 0.4921
8.9286 6500 0.4756
8.9973 6550 0.5258
9.0659 6600 0.4658
9.1346 6650 0.4894
9.2033 6700 0.5097
9.2720 6750 0.493
9.3407 6800 0.5311
9.4093 6850 0.5157
9.4780 6900 0.5142
9.5467 6950 0.4664
9.6154 7000 0.528
9.6841 7050 0.5645
9.7527 7100 0.5214
9.8214 7150 0.4777
9.8901 7200 0.5449
9.9588 7250 0.492
10.0275 7300 0.4591
10.0962 7350 0.4576
10.1648 7400 0.4692
10.2335 7450 0.5415
10.3022 7500 0.4803
10.3709 7550 0.5487
10.4396 7600 0.5706
10.5082 7650 0.4815
10.5769 7700 0.4585
10.6456 7750 0.4861
10.7143 7800 0.4247
10.7830 7850 0.4906
10.8516 7900 0.5371
10.9203 7950 0.5393
10.9890 8000 0.4788
11.0577 8050 0.5038
11.1264 8100 0.4838
11.1951 8150 0.515
11.2637 8200 0.5299
11.3324 8250 0.5044
11.4011 8300 0.5045
11.4698 8350 0.465
11.5385 8400 0.5253
11.6071 8450 0.4517
11.6758 8500 0.5048
11.7445 8550 0.4733
11.8132 8600 0.47
11.8819 8650 0.4552
11.9505 8700 0.4203
12.0192 8750 0.395
12.0879 8800 0.5411
12.1566 8850 0.4911
12.2253 8900 0.4641
12.2940 8950 0.4608
12.3626 9000 0.4839
12.4313 9050 0.4491
12.5 9100 0.517
12.5687 9150 0.5031
12.6374 9200 0.4869
12.7060 9250 0.4856
12.7747 9300 0.4754
12.8434 9350 0.5167
12.9121 9400 0.5004
12.9808 9450 0.5293
13.0495 9500 0.4566
13.1181 9550 0.477
13.1868 9600 0.4501
13.2555 9650 0.4791
13.3242 9700 0.4746
13.3929 9750 0.4702
13.4615 9800 0.469
13.5302 9850 0.5046
13.5989 9900 0.4895
13.6676 9950 0.5223
13.7363 10000 0.4245
13.8049 10050 0.4701
13.8736 10100 0.4548
13.9423 10150 0.4998
14.0110 10200 0.4345
14.0797 10250 0.4371
14.1484 10300 0.5009
14.2170 10350 0.4816
14.2857 10400 0.4665
14.3544 10450 0.5047
14.4231 10500 0.5132
14.4918 10550 0.473
14.5604 10600 0.4387
14.6291 10650 0.4775
14.6978 10700 0.4522
14.7665 10750 0.4807
14.8352 10800 0.482
14.9038 10850 0.4625
14.9725 10900 0.5052

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.0.0
  • Transformers: 4.57.6
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
4
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for GozdeA/tennis-multi-return-knn-v3

Papers for GozdeA/tennis-multi-return-knn-v3