SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("GozdeA/tennis-multi-return-v4")
# Run inference
sentences = [
    'Show me previous game result',
    'what venue',
    'How is the tactical battle between the player and Amanda Anismova playing out?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.6952, -0.0128],
#         [ 0.6952,  1.0000,  0.0505],
#         [-0.0128,  0.0505,  1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 11,600 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 4 tokens
    • mean: 10.75 tokens
    • max: 26 tokens
    • min: 4 tokens
    • mean: 8.66 tokens
    • max: 23 tokens
    • min: 4 tokens
    • mean: 10.45 tokens
    • max: 23 tokens
  • Samples:
    anchor positive negative
    What is the this season for Djokovic? What's the this season for Djokovic? What is the attacking this set for Djokovic?
    who is projected? momentum shift? How does she's path to this round compare to Amanda Anismova's?
    What's the sets won for Sinner? Show me how many winners What's the last year for Djokovic?
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 2,900 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 4 tokens
    • mean: 10.63 tokens
    • max: 28 tokens
    • min: 4 tokens
    • mean: 8.64 tokens
    • max: 26 tokens
    • min: 4 tokens
    • mean: 10.24 tokens
    • max: 24 tokens
  • Samples:
    anchor positive negative
    What about Djokovic's games? What's the how many winners for Djokovic? ranking for the player?
    What is the next match for Djokovic? What are the next match for Djokovic? What is the pre match for Djokovic?
    What are the gaining momentum for Sinner? What is the gaining momentum for Sinner? What are the gaining control for Sinner?
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 15
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 15
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss
0.0690 50 5.1095
0.1379 100 3.9909
0.2069 150 3.1963
0.2759 200 2.3301
0.3448 250 1.9904
0.4138 300 1.6705
0.4828 350 1.5659
0.5517 400 1.5497
0.6207 450 1.3563
0.6897 500 1.2982
0.7586 550 1.2509
0.8276 600 1.1737
0.8966 650 1.1486
0.9655 700 1.192
1.0345 750 0.9715
1.1034 800 1.0054
1.1724 850 1.0102
1.2414 900 0.9393
1.3103 950 0.9119
1.3793 1000 0.8589
1.4483 1050 0.9049
1.5172 1100 0.8774
1.5862 1150 0.8488
1.6552 1200 0.8382
1.7241 1250 0.7437
1.7931 1300 0.8023
1.8621 1350 0.7775
1.9310 1400 0.7756
2.0 1450 0.7273
2.0690 1500 0.6275
2.1379 1550 0.7331
2.2069 1600 0.629
2.2759 1650 0.7127
2.3448 1700 0.6503
2.4138 1750 0.7082
2.4828 1800 0.6939
2.5517 1850 0.6993
2.6207 1900 0.7067
2.6897 1950 0.6622
2.7586 2000 0.6499
2.8276 2050 0.6923
2.8966 2100 0.6208
2.9655 2150 0.5925
3.0345 2200 0.6697
3.1034 2250 0.6458
3.1724 2300 0.5709
3.2414 2350 0.5987
3.3103 2400 0.6252
3.3793 2450 0.6377
3.4483 2500 0.5739
3.5172 2550 0.6281
3.5862 2600 0.6186
3.6552 2650 0.5828
3.7241 2700 0.678
3.7931 2750 0.6257
3.8621 2800 0.5704
3.9310 2850 0.6151
4.0 2900 0.5898
4.0690 2950 0.5277
4.1379 3000 0.6128
4.2069 3050 0.6306
4.2759 3100 0.5739
4.3448 3150 0.5396
4.4138 3200 0.617
4.4828 3250 0.5119
4.5517 3300 0.6136
4.6207 3350 0.6303
4.6897 3400 0.6138
4.7586 3450 0.6214
4.8276 3500 0.5686
4.8966 3550 0.5901
4.9655 3600 0.6913
5.0345 3650 0.5706
5.1034 3700 0.6082
5.1724 3750 0.4755
5.2414 3800 0.5777
5.3103 3850 0.5515
5.3793 3900 0.5271
5.4483 3950 0.5816
5.5172 4000 0.5787
5.5862 4050 0.568
5.6552 4100 0.5593
5.7241 4150 0.542
5.7931 4200 0.5873
5.8621 4250 0.5647
5.9310 4300 0.6369
6.0 4350 0.5775
6.0690 4400 0.5324
6.1379 4450 0.5463
6.2069 4500 0.5234
6.2759 4550 0.4921
6.3448 4600 0.5716
6.4138 4650 0.6321
6.4828 4700 0.4881
6.5517 4750 0.5717
6.6207 4800 0.5922
6.6897 4850 0.5289
6.7586 4900 0.5182
6.8276 4950 0.5096
6.8966 5000 0.6062
6.9655 5050 0.6014
7.0345 5100 0.5033
7.1034 5150 0.4994
7.1724 5200 0.5842
7.2414 5250 0.5317
7.3103 5300 0.5112
7.3793 5350 0.5188
7.4483 5400 0.6174
7.5172 5450 0.484
7.5862 5500 0.5571
7.6552 5550 0.5043
7.7241 5600 0.5341
7.7931 5650 0.5225
7.8621 5700 0.5618
7.9310 5750 0.5537
8.0 5800 0.5811
8.0690 5850 0.5311
8.1379 5900 0.5585
8.2069 5950 0.5564
8.2759 6000 0.5469
8.3448 6050 0.5726
8.4138 6100 0.5329
8.4828 6150 0.55
8.5517 6200 0.5365
8.6207 6250 0.5847
8.6897 6300 0.5204
8.7586 6350 0.5112
8.8276 6400 0.5468
8.8966 6450 0.4871
8.9655 6500 0.5449
9.0345 6550 0.5237
9.1034 6600 0.5232
9.1724 6650 0.5075
9.2414 6700 0.5078
9.3103 6750 0.5366
9.3793 6800 0.5636
9.4483 6850 0.4743
9.5172 6900 0.4776
9.5862 6950 0.5571
9.6552 7000 0.56
9.7241 7050 0.5054
9.7931 7100 0.5431
9.8621 7150 0.5358
9.9310 7200 0.5395
10.0 7250 0.5394
10.0690 7300 0.57
10.1379 7350 0.4883
10.2069 7400 0.4884
10.2759 7450 0.4587
10.3448 7500 0.5076
10.4138 7550 0.5108
10.4828 7600 0.565
10.5517 7650 0.503
10.6207 7700 0.5645
10.6897 7750 0.509
10.7586 7800 0.4993
10.8276 7850 0.5464
10.8966 7900 0.5293
10.9655 7950 0.5384
11.0345 8000 0.5245
11.1034 8050 0.4647
11.1724 8100 0.4983
11.2414 8150 0.5168
11.3103 8200 0.5455
11.3793 8250 0.5069
11.4483 8300 0.5523
11.5172 8350 0.4875
11.5862 8400 0.4947
11.6552 8450 0.5022
11.7241 8500 0.5096
11.7931 8550 0.5768
11.8621 8600 0.5187
11.9310 8650 0.4883
12.0 8700 0.5039
12.0690 8750 0.527
12.1379 8800 0.5382
12.2069 8850 0.4912
12.2759 8900 0.5144
12.3448 8950 0.532
12.4138 9000 0.5233
12.4828 9050 0.4169
12.5517 9100 0.5278
12.6207 9150 0.5028
12.6897 9200 0.5227
12.7586 9250 0.4812
12.8276 9300 0.5299
12.8966 9350 0.5383
12.9655 9400 0.5245
13.0345 9450 0.5045
13.1034 9500 0.5619
13.1724 9550 0.4969
13.2414 9600 0.508
13.3103 9650 0.5095
13.3793 9700 0.5095
13.4483 9750 0.4886
13.5172 9800 0.5074
13.5862 9850 0.4761
13.6552 9900 0.4805
13.7241 9950 0.4559
13.7931 10000 0.5212
13.8621 10050 0.506
13.9310 10100 0.5086
14.0 10150 0.5232
14.0690 10200 0.5156
14.1379 10250 0.495
14.2069 10300 0.5226
14.2759 10350 0.4842
14.3448 10400 0.4514
14.4138 10450 0.4902
14.4828 10500 0.5068
14.5517 10550 0.5784
14.6207 10600 0.5646
14.6897 10650 0.4994
14.7586 10700 0.552
14.8276 10750 0.5216
14.8966 10800 0.5506
14.9655 10850 0.4286

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.0.0
  • Transformers: 4.57.6
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
5
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for GozdeA/tennis-multi-return-v4

Papers for GozdeA/tennis-multi-return-v4