SentenceTransformer based on sentence-transformers/multi-qa-MiniLM-L6-dot-v1

This is a sentence-transformers model finetuned from sentence-transformers/multi-qa-MiniLM-L6-dot-v1. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Trelis/multi-qa-MiniLM-L6-dot-v1-2-constant-ep-MNRLtriplets-2e-5-batch32-cuda-overlap")
# Run inference
sentences = [
    'What is the minimum number of males and females required on the field of play in mixed gender competitions?',
    '5. 3. 1 this does not apply for players sent to the sin bin area. 5. 4 in mixed gender competitions, the maximum number of males allowed on the field of play is three ( 3 ), the minimum male requirement is one ( 1 ) and the minimum female requirement is one ( 1 ). 6 team coach and team officials 6. 1 the team coach ( s ) and team officials may be permitted inside the perimeter but shall be required to be positioned either in the interchange area or at the end of the field of play for the duration of the match. 6. 2 the team coach ( s ) and team officials may move from one position to the other but shall do so without delay. while in a position at the end of the field of play, the team coach ( s ) or team official must remain no closer than five ( 5 ) metres from the dead ball line and must not coach or communicate ( verbal or non - verbal ) with either team or the referees.',
    'tap and tap penalty the method of commencing the match, recommencing the match after half time and after a try has been scored. the tap is also the method of recommencing play when a penalty is awarded. the tap is taken by placing the ball on the ground at or behind the mark, releasing both hands from the ball, tapping the ball gently with either foot or touching the foot on the ball. the ball must not roll or move more than one ( 1 ) metre in any direction and must be retrieved cleanly, without touching the ground again. the player may face any direction and use either foot. provided it is at the mark, the ball does not have to be lifted from the ground prior to a tap being taken. team a group of players constituting one ( 1 ) side in a competition match. tfa touch football australia limited touch any contact between the player in possession and a defending player. a touch includes contact on the ball, hair or clothing and may be made by a defending player or by the player in possession. touch count the progressive number of touches that each team has before a change of possession, from zero ( 0 ) to six ( 6 ).',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • learning_rate: 2e-05
  • num_train_epochs: 2
  • lr_scheduler_type: constant
  • warmup_ratio: 0.3
  • bf16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: constant
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.3
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss
0.0066 2 2.4302 -
0.0131 4 2.4247 -
0.0197 6 2.0174 -
0.0262 8 2.2159 -
0.0328 10 2.0163 -
0.0393 12 1.7183 -
0.0459 14 1.9459 -
0.0525 16 2.0123 -
0.0590 18 1.7977 -
0.0656 20 2.1162 -
0.0721 22 1.6443 -
0.0787 24 1.9009 -
0.0852 26 1.5068 -
0.0918 28 1.6354 -
0.0984 30 1.6703 -
0.1049 32 1.8509 -
0.1115 34 1.6663 -
0.1180 36 1.3685 -
0.1246 38 1.5531 -
0.1311 40 1.3564 -
0.1377 42 1.3271 -
0.1443 44 1.6339 -
0.1508 46 1.5644 -
0.1574 48 1.3918 -
0.1639 50 1.3628 -
0.1705 52 1.1994 -
0.1770 54 1.1174 -
0.1836 56 1.3724 -
0.1902 58 1.3164 -
0.1967 60 1.2333 -
0.2033 62 1.3354 -
0.2098 64 1.2378 -
0.2164 66 1.4894 -
0.2230 68 1.1909 -
0.2295 70 1.1961 -
0.2361 72 1.0392 -
0.2426 74 1.0383 -
0.2492 76 1.1072 -
0.2525 77 - 0.8909
0.2557 78 1.2151 -
0.2623 80 1.1497 -
0.2689 82 0.9377 -
0.2754 84 1.2349 -
0.2820 86 1.1121 -
0.2885 88 1.0621 -
0.2951 90 1.2678 -
0.3016 92 1.0484 -
0.3082 94 0.9637 -
0.3148 96 0.9904 -
0.3213 98 0.9988 -
0.3279 100 0.8051 -
0.3344 102 1.0701 -
0.3410 104 1.1697 -
0.3475 106 1.1753 -
0.3541 108 1.1611 -
0.3607 110 0.9969 -
0.3672 112 0.9606 -
0.3738 114 0.9209 -
0.3803 116 1.0459 -
0.3869 118 0.8615 -
0.3934 120 0.7766 -
0.4 122 1.0155 -
0.4066 124 0.9394 -
0.4131 126 0.8924 -
0.4197 128 0.8024 -
0.4262 130 1.0985 -
0.4328 132 1.0747 -
0.4393 134 1.0246 -
0.4459 136 0.9245 -
0.4525 138 0.909 -
0.4590 140 1.0893 -
0.4656 142 1.0213 -
0.4721 144 0.8544 -
0.4787 146 0.9737 -
0.4852 148 0.8735 -
0.4918 150 0.928 -
0.4984 152 0.8356 -
0.5049 154 1.0019 0.7711
0.5115 156 1.0054 -
0.5180 158 0.8963 -
0.5246 160 0.9006 -
0.5311 162 0.9877 -
0.5377 164 1.0281 -
0.5443 166 0.8472 -
0.5508 168 0.9504 -
0.5574 170 1.0462 -
0.5639 172 0.9501 -
0.5705 174 0.8996 -
0.5770 176 1.0198 -
0.5836 178 0.9341 -
0.5902 180 0.8529 -
0.5967 182 0.939 -
0.6033 184 1.0716 -
0.6098 186 0.9437 -
0.6164 188 0.7956 -
0.6230 190 0.8259 -
0.6295 192 0.941 -
0.6361 194 0.8254 -
0.6426 196 0.8056 -
0.6492 198 0.9525 -
0.6557 200 0.7497 -
0.6623 202 0.9103 -
0.6689 204 1.0092 -
0.6754 206 0.8893 -
0.6820 208 0.924 -
0.6885 210 0.8118 -
0.6951 212 0.7734 -
0.7016 214 0.8612 -
0.7082 216 0.6743 -
0.7148 218 0.9175 -
0.7213 220 0.9795 -
0.7279 222 0.9852 -
0.7344 224 0.7345 -
0.7410 226 0.9914 -
0.7475 228 0.9152 -
0.7541 230 1.0494 -
0.7574 231 - 0.7461
0.7607 232 0.8496 -
0.7672 234 0.8374 -
0.7738 236 0.796 -
0.7803 238 0.8899 -
0.7869 240 1.055 -
0.7934 242 0.9787 -
0.8 244 0.8813 -
0.8066 246 1.0675 -
0.8131 248 1.0196 -
0.8197 250 0.7574 -
0.8262 252 0.9044 -
0.8328 254 0.8997 -
0.8393 256 0.9668 -
0.8459 258 0.8887 -
0.8525 260 1.0042 -
0.8590 262 1.0572 -
0.8656 264 0.8395 -
0.8721 266 0.7637 -
0.8787 268 0.952 -
0.8852 270 0.9178 -
0.8918 272 0.7949 -
0.8984 274 0.8409 -
0.9049 276 0.8708 -
0.9115 278 0.8427 -
0.9180 280 0.9451 -
0.9246 282 0.8579 -
0.9311 284 0.7472 -
0.9377 286 0.8878 -
0.9443 288 0.8266 -
0.9508 290 0.7753 -
0.9574 292 0.7455 -
0.9639 294 0.9418 -
0.9705 296 0.8795 -
0.9770 298 0.8713 -
0.9836 300 0.896 -
0.9902 302 0.7666 -
0.9967 304 0.8474 -
1.0033 306 0.5415 -
1.0098 308 0.9159 0.7310
1.0164 310 1.049 -
1.0230 312 0.9572 -
1.0295 314 0.9994 -
1.0361 316 0.8166 -
1.0426 318 0.8915 -
1.0492 320 0.8417 -
1.0557 322 0.6382 -
1.0623 324 1.1689 -
1.0689 326 0.7979 -
1.0754 328 0.9044 -
1.0820 330 1.0126 -
1.0885 332 0.9459 -
1.0951 334 0.7851 -
1.1016 336 0.8744 -
1.1082 338 0.8425 -
1.1148 340 0.8789 -
1.1213 342 0.8451 -
1.1279 344 0.8488 -
1.1344 346 0.8097 -
1.1410 348 0.7656 -
1.1475 350 0.8751 -
1.1541 352 0.7859 -
1.1607 354 0.7413 -
1.1672 356 1.0012 -
1.1738 358 0.7506 -
1.1803 360 0.8725 -
1.1869 362 0.9096 -
1.1934 364 0.9487 -
1.2 366 0.7911 -
1.2066 368 0.9752 -
1.2131 370 0.9904 -
1.2197 372 0.7559 -
1.2262 374 0.7669 -
1.2328 376 0.8321 -
1.2393 378 0.9426 -
1.2459 380 0.928 -
1.2525 382 0.8514 -
1.2590 384 0.8755 -
1.2623 385 - 0.7263
1.2656 386 0.9364 -
1.2721 388 0.9249 -
1.2787 390 0.8506 -
1.2852 392 0.9558 -
1.2918 394 0.9067 -
1.2984 396 0.8908 -
1.3049 398 0.6504 -
1.3115 400 0.7768 -
1.3180 402 0.6553 -
1.3246 404 0.6869 -
1.3311 406 0.9872 -
1.3377 408 0.828 -
1.3443 410 0.896 -
1.3508 412 0.8047 -
1.3574 414 0.8023 -
1.3639 416 1.0378 -
1.3705 418 0.8644 -
1.3770 420 0.9643 -
1.3836 422 0.7227 -
1.3902 424 0.7723 -
1.3967 426 0.9843 -
1.4033 428 0.7796 -
1.4098 430 0.8349 -
1.4164 432 0.8458 -
1.4230 434 0.6638 -
1.4295 436 0.85 -
1.4361 438 0.8938 -
1.4426 440 0.9992 -
1.4492 442 0.8008 -
1.4557 444 0.8251 -
1.4623 446 0.94 -
1.4689 448 0.911 -
1.4754 450 0.8789 -
1.4820 452 0.7201 -
1.4885 454 0.9465 -
1.4951 456 0.7776 -
1.5016 458 0.9056 -
1.5082 460 0.9087 -
1.5148 462 0.9425 0.7224
1.5213 464 0.8603 -
1.5279 466 0.8143 -
1.5344 468 1.0147 -
1.5410 470 0.7188 -
1.5475 472 0.8249 -
1.5541 474 0.7593 -
1.5607 476 0.9883 -
1.5672 478 0.7453 -
1.5738 480 0.7667 -
1.5803 482 0.7323 -
1.5869 484 0.8276 -
1.5934 486 0.7984 -
1.6 488 0.8216 -
1.6066 490 0.6734 -
1.6131 492 0.6356 -
1.6197 494 0.8072 -
1.6262 496 0.7929 -
1.6328 498 0.8359 -
1.6393 500 0.8005 -
1.6459 502 0.8072 -
1.6525 504 0.7875 -
1.6590 506 0.7381 -
1.6656 508 0.8326 -
1.6721 510 0.8628 -
1.6787 512 0.9308 -
1.6852 514 0.7246 -
1.6918 516 0.8821 -
1.6984 518 0.7214 -
1.7049 520 0.7731 -
1.7115 522 0.7165 -
1.7180 524 0.8376 -
1.7246 526 0.8067 -
1.7311 528 0.8293 -
1.7377 530 0.9654 -
1.7443 532 0.6332 -
1.7508 534 0.8155 -
1.7574 536 0.7569 -
1.7639 538 0.7649 -
1.7672 539 - 0.7193
1.7705 540 0.7826 -
1.7770 542 0.7806 -
1.7836 544 0.701 -
1.7902 546 0.8998 -
1.7967 548 0.7879 -
1.8033 550 0.9837 -
1.8098 552 0.8297 -
1.8164 554 0.8317 -
1.8230 556 0.8819 -
1.8295 558 0.6683 -
1.8361 560 0.8085 -
1.8426 562 0.7737 -
1.8492 564 0.7873 -
1.8557 566 0.7587 -
1.8623 568 0.7513 -
1.8689 570 0.9404 -
1.8754 572 0.7818 -
1.8820 574 0.761 -
1.8885 576 0.7163 -
1.8951 578 0.7994 -
1.9016 580 0.8483 -
1.9082 582 0.7287 -
1.9148 584 0.8435 -
1.9213 586 0.8493 -
1.9279 588 0.8544 -
1.9344 590 0.7437 -
1.9410 592 0.7449 -
1.9475 594 0.7808 -
1.9541 596 0.8658 -
1.9607 598 0.6678 -
1.9672 600 0.7104 -
1.9738 602 0.8293 -
1.9803 604 0.8346 -
1.9869 606 0.885 -
1.9934 608 0.6521 -
2.0 610 0.3965 -

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.3
  • PyTorch: 2.1.1+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.17.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
23
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Trelis/multi-qa-MiniLM-L6-dot-v1-2-constant-ep-MNRLtriplets-2e-5-batch32-cuda-overlap

Finetuned
(5)
this model