SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("GozdeA/tennis-multi-return-mlp-v2")
# Run inference
sentences = [
    'What is the can he win for Djokovic?',
    'form shift?',
    'What is the set time for the player?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6634, 0.0789],
#         [0.6634, 1.0000, 0.1159],
#         [0.0789, 0.1159, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 11,600 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 4 tokens
    • mean: 10.77 tokens
    • max: 28 tokens
    • min: 4 tokens
    • mean: 8.81 tokens
    • max: 28 tokens
    • min: 4 tokens
    • mean: 10.67 tokens
    • max: 28 tokens
  • Samples:
    anchor positive negative
    What is the overall return for Djokovic? overall for Djokovic? What is the return winners for Djokovic?
    What is the return winner count for Alcaraz and Fritz? how many winners? What is the how good is his return for Sinner?
    backhand for he? What is the backhand quality for he? What is the backhand today for he?
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 2,900 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 4 tokens
    • mean: 10.9 tokens
    • max: 28 tokens
    • min: 4 tokens
    • mean: 8.62 tokens
    • max: 26 tokens
    • min: 4 tokens
    • mean: 10.38 tokens
    • max: 25 tokens
  • Samples:
    anchor positive negative
    How does Shelton's game match up against Lorenzo Sonego's strengths? key factors? What is the date of birth for Djokovic?
    What is the what are the key for Sinner? What's the what are the key for Sinner? What are the what is a for Sinner?
    professional career stats? professional career titles? How does Shelton's forehand compare to their career average?
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 15
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 15
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss
0.0690 50 4.8633
0.1379 100 4.2929
0.2069 150 3.2473
0.2759 200 2.4133
0.3448 250 2.0601
0.4138 300 1.7225
0.4828 350 1.631
0.5517 400 1.5036
0.6207 450 1.3556
0.6897 500 1.2699
0.7586 550 1.3131
0.8276 600 1.1743
0.8966 650 1.0491
0.9655 700 1.2265
1.0345 750 1.0786
1.1034 800 1.0451
1.1724 850 1.0379
1.2414 900 0.9378
1.3103 950 0.8659
1.3793 1000 0.8908
1.4483 1050 0.8333
1.5172 1100 0.7814
1.5862 1150 0.7764
1.6552 1200 0.8071
1.7241 1250 0.7394
1.7931 1300 0.7137
1.8621 1350 0.7669
1.9310 1400 0.6652
2.0 1450 0.7612
2.0690 1500 0.6847
2.1379 1550 0.6511
2.2069 1600 0.7297
2.2759 1650 0.6836
2.3448 1700 0.6733
2.4138 1750 0.6125
2.4828 1800 0.664
2.5517 1850 0.6212
2.6207 1900 0.6613
2.6897 1950 0.645
2.7586 2000 0.6311
2.8276 2050 0.6823
2.8966 2100 0.6608
2.9655 2150 0.6408
3.0345 2200 0.6364
3.1034 2250 0.5752
3.1724 2300 0.6431
3.2414 2350 0.585
3.3103 2400 0.6852
3.3793 2450 0.6743
3.4483 2500 0.5907
3.5172 2550 0.5632
3.5862 2600 0.5853
3.6552 2650 0.5906
3.7241 2700 0.6471
3.7931 2750 0.5809
3.8621 2800 0.5832
3.9310 2850 0.6011
4.0 2900 0.5926
4.0690 2950 0.5962
4.1379 3000 0.6648
4.2069 3050 0.5759
4.2759 3100 0.5162
4.3448 3150 0.5945
4.4138 3200 0.5859
4.4828 3250 0.6066
4.5517 3300 0.5536
4.6207 3350 0.5112
4.6897 3400 0.5094
4.7586 3450 0.5056
4.8276 3500 0.573
4.8966 3550 0.5425
4.9655 3600 0.5641
5.0345 3650 0.5409
5.1034 3700 0.58
5.1724 3750 0.5669
5.2414 3800 0.6087
5.3103 3850 0.557
5.3793 3900 0.5191
5.4483 3950 0.5321
5.5172 4000 0.5965
5.5862 4050 0.5612
5.6552 4100 0.6181
5.7241 4150 0.5144
5.7931 4200 0.5187
5.8621 4250 0.5362
5.9310 4300 0.5215
6.0 4350 0.5578
6.0690 4400 0.5291
6.1379 4450 0.512
6.2069 4500 0.5702
6.2759 4550 0.5935
6.3448 4600 0.5376
6.4138 4650 0.5012
6.4828 4700 0.6246
6.5517 4750 0.5038
6.6207 4800 0.5739
6.6897 4850 0.5765
6.7586 4900 0.58
6.8276 4950 0.5462
6.8966 5000 0.5087
6.9655 5050 0.5357
7.0345 5100 0.5352
7.1034 5150 0.5002
7.1724 5200 0.5196
7.2414 5250 0.5668
7.3103 5300 0.5104
7.3793 5350 0.5029
7.4483 5400 0.481
7.5172 5450 0.5567
7.5862 5500 0.5425
7.6552 5550 0.4884
7.7241 5600 0.4854
7.7931 5650 0.5459
7.8621 5700 0.5201
7.9310 5750 0.5288
8.0 5800 0.5055
8.0690 5850 0.4656
8.1379 5900 0.5538
8.2069 5950 0.5513
8.2759 6000 0.5078
8.3448 6050 0.508
8.4138 6100 0.5403
8.4828 6150 0.4711
8.5517 6200 0.5024
8.6207 6250 0.4886
8.6897 6300 0.5446
8.7586 6350 0.4953
8.8276 6400 0.5395
8.8966 6450 0.571
8.9655 6500 0.567
9.0345 6550 0.5684
9.1034 6600 0.543
9.1724 6650 0.5449
9.2414 6700 0.4713
9.3103 6750 0.5046
9.3793 6800 0.5785
9.4483 6850 0.4744
9.5172 6900 0.5364
9.5862 6950 0.5523
9.6552 7000 0.5245
9.7241 7050 0.5005
9.7931 7100 0.5355
9.8621 7150 0.5248
9.9310 7200 0.4924
10.0 7250 0.4885
10.0690 7300 0.4708
10.1379 7350 0.5075
10.2069 7400 0.4943
10.2759 7450 0.4926
10.3448 7500 0.4757
10.4138 7550 0.5305
10.4828 7600 0.4626
10.5517 7650 0.5161
10.6207 7700 0.48
10.6897 7750 0.466
10.7586 7800 0.5556
10.8276 7850 0.51
10.8966 7900 0.5185
10.9655 7950 0.5485
11.0345 8000 0.4591
11.1034 8050 0.523
11.1724 8100 0.5295
11.2414 8150 0.4482
11.3103 8200 0.5275
11.3793 8250 0.4849
11.4483 8300 0.5374
11.5172 8350 0.4621
11.5862 8400 0.4374
11.6552 8450 0.4855
11.7241 8500 0.5147
11.7931 8550 0.564
11.8621 8600 0.4763
11.9310 8650 0.4456
12.0 8700 0.4906
12.0690 8750 0.4912
12.1379 8800 0.4556
12.2069 8850 0.4936
12.2759 8900 0.4864
12.3448 8950 0.5262
12.4138 9000 0.458
12.4828 9050 0.5631
12.5517 9100 0.5144
12.6207 9150 0.4966
12.6897 9200 0.5589
12.7586 9250 0.4718
12.8276 9300 0.5124
12.8966 9350 0.5362
12.9655 9400 0.482
13.0345 9450 0.4821
13.1034 9500 0.4984
13.1724 9550 0.4646
13.2414 9600 0.4825
13.3103 9650 0.4957
13.3793 9700 0.4739
13.4483 9750 0.523
13.5172 9800 0.4892
13.5862 9850 0.4803
13.6552 9900 0.502
13.7241 9950 0.4828
13.7931 10000 0.5034
13.8621 10050 0.5151
13.9310 10100 0.5292
14.0 10150 0.5227
14.0690 10200 0.4853
14.1379 10250 0.4528
14.2069 10300 0.4591
14.2759 10350 0.4482
14.3448 10400 0.4412
14.4138 10450 0.4854
14.4828 10500 0.4734
14.5517 10550 0.4749
14.6207 10600 0.5448
14.6897 10650 0.5117
14.7586 10700 0.4776
14.8276 10750 0.4638
14.8966 10800 0.5636
14.9655 10850 0.547

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.0.0
  • Transformers: 4.57.6
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
3
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for GozdeA/tennis-multi-return-mlp-v2

Papers for GozdeA/tennis-multi-return-mlp-v2