SentenceTransformer based on cross-encoder/ms-marco-MiniLM-L-6-v2

This is a sentence-transformers model finetuned from cross-encoder/ms-marco-MiniLM-L-6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: cross-encoder/ms-marco-MiniLM-L-6-v2
Maximum Sequence Length: 512 tokens
Output Dimensionality: 384 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Trelis/ms-marco-MiniLM-L-6-v2-2-cst-ep-MNRLtriplets-2e-5-batch32-gpu-overlap")
# Run inference
sentences = [
    'What is the minimum number of digits allowed for identifying numbers according to clause 4.3.1?',
    '2. 2 teams playing unregistered players are liable to forfeit any match in which unregistered players have competed. fit playing rules - 5th edition copyright © touch football australia 2020 5 3 the ball 3. 1 the game is played with an oval, inflated ball of a shape, colour and size approved by fit or the nta. 3. 2 the ball shall be inflated to the manufacturers ’ recommended air pressure. 3. 3 the referee shall immediately pause the match if the size and shape of the ball no longer complies with clauses 3. 1 or 3. 2 to allow for the ball to replaced or the issue rectified. 3. 4 the ball must not be hidden under player attire. 4 playing uniform 4. 1 participating players are to be correctly attired in matching team uniforms 4. 2 playing uniforms consist of shirt, singlet or other item as approved by the nta or nta competition provider, shorts and / or tights and socks. 4. 3 all players are to wear a unique identifying number not less than 16cm in height, clearly displayed on the rear of the playing top. 4. 3. 1 identifying numbers must feature no more than two ( 2 ) digits.',
    '24. 5 for the avoidance of doubt for clauses 24. 3 and 24. 4 the non - offending team will retain a numerical advantage on the field of play during the drop - off. 25 match officials 25. 1 the referee is the sole judge on all match related matters inside the perimeter for the duration of a match, has jurisdiction over all players, coaches and officials and is required to : 25. 1. 1 inspect the field of play, line markings and markers prior to the commencement of the match to ensure the safety of all participants. 25. 1. 2 adjudicate on the rules of the game ; 25. 1. 3 impose any sanction necessary to control the match ; 25. 1. 4 award tries and record the progressive score ; 25. 1. 5 maintain a count of touches during each possession ; 25. 1. 6 award penalties for infringements against the rules ; and 25. 1. 7 report to the relevant competition administration any sin bins, dismissals or injuries to any participant sustained during a match. 25. 2 only team captains are permitted to seek clarification of a decision directly from the referee. an approach may only be made during a break in play or at the discretion of the referee.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
learning_rate: 2e-05
num_train_epochs: 2
lr_scheduler_type: constant
warmup_ratio: 0.3
bf16: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 2
max_steps: -1
lr_scheduler_type: constant
lr_scheduler_kwargs: {}
warmup_ratio: 0.3
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	loss
0.0066	2	4.4256	-
0.0131	4	4.1504	-
0.0197	6	4.0494	-
0.0262	8	4.0447	-
0.0328	10	3.9851	-
0.0393	12	3.9284	-
0.0459	14	3.9155	-
0.0525	16	3.8791	-
0.0590	18	3.8663	-
0.0656	20	3.9012	-
0.0721	22	3.8999	-
0.0787	24	3.7895	-
0.0852	26	3.7235	-
0.0918	28	3.7938	-
0.0984	30	3.5057	-
0.1049	32	3.5776	-
0.1115	34	3.5092	-
0.1180	36	3.7226	-
0.1246	38	3.5426	-
0.1311	40	3.7318	-
0.1377	42	3.529	-
0.1443	44	3.5977	-
0.1508	46	3.6484	-
0.1574	48	3.5026	-
0.1639	50	3.4568	-
0.1705	52	3.6119	-
0.1770	54	3.4206	-
0.1836	56	3.3701	-
0.1902	58	3.3232	-
0.1967	60	3.3398	-
0.2033	62	3.333	-
0.2098	64	3.3587	-
0.2164	66	3.1304	-
0.2230	68	3.0618	-
0.2295	70	3.145	-
0.2361	72	3.2074	-
0.2426	74	3.0436	-
0.2492	76	3.0572	-
0.2525	77	-	3.0810
0.2557	78	3.1225	-
0.2623	80	2.8197	-
0.2689	82	2.8979	-
0.2754	84	2.7827	-
0.2820	86	2.9472	-
0.2885	88	2.918	-
0.2951	90	2.7035	-
0.3016	92	2.6876	-
0.3082	94	2.8322	-
0.3148	96	2.6335	-
0.3213	98	2.3754	-
0.3279	100	3.0978	-
0.3344	102	2.4946	-
0.3410	104	2.5085	-
0.3475	106	2.7456	-
0.3541	108	2.3934	-
0.3607	110	2.3222	-
0.3672	112	2.4773	-
0.3738	114	2.6684	-
0.3803	116	2.2435	-
0.3869	118	2.243	-
0.3934	120	2.228	-
0.4	122	2.4652	-
0.4066	124	2.2113	-
0.4131	126	2.0805	-
0.4197	128	2.5041	-
0.4262	130	2.4489	-
0.4328	132	2.2474	-
0.4393	134	2.0252	-
0.4459	136	2.257	-
0.4525	138	1.9381	-
0.4590	140	2.0183	-
0.4656	142	2.1021	-
0.4721	144	2.1508	-
0.4787	146	1.9669	-
0.4852	148	1.7468	-
0.4918	150	1.8776	-
0.4984	152	1.8081	-
0.5049	154	1.6799	1.6088
0.5115	156	1.9628	-
0.5180	158	1.8253	-
0.5246	160	1.7791	-
0.5311	162	1.8463	-
0.5377	164	1.6357	-
0.5443	166	1.6531	-
0.5508	168	1.6747	-
0.5574	170	1.5666	-
0.5639	172	1.7272	-
0.5705	174	1.6045	-
0.5770	176	1.3786	-
0.5836	178	1.6547	-
0.5902	180	1.6416	-
0.5967	182	1.4796	-
0.6033	184	1.4595	-
0.6098	186	1.4106	-
0.6164	188	1.4844	-
0.6230	190	1.4581	-
0.6295	192	1.4922	-
0.6361	194	1.2978	-
0.6426	196	1.2612	-
0.6492	198	1.4725	-
0.6557	200	1.3162	-
0.6623	202	1.3736	-
0.6689	204	1.4553	-
0.6754	206	1.4011	-
0.6820	208	1.2523	-
0.6885	210	1.3732	-
0.6951	212	1.3721	-
0.7016	214	1.5262	-
0.7082	216	1.2631	-
0.7148	218	1.6174	-
0.7213	220	1.4252	-
0.7279	222	1.3527	-
0.7344	224	1.1969	-
0.7410	226	1.2901	-
0.7475	228	1.4379	-
0.7541	230	1.1332	-
0.7574	231	-	1.0046
0.7607	232	1.3693	-
0.7672	234	1.3097	-
0.7738	236	1.2314	-
0.7803	238	1.0873	-
0.7869	240	1.2882	-
0.7934	242	1.1723	-
0.8	244	1.1748	-
0.8066	246	1.2916	-
0.8131	248	1.0894	-
0.8197	250	1.2299	-
0.8262	252	1.207	-
0.8328	254	1.1361	-
0.8393	256	1.1323	-
0.8459	258	1.0927	-
0.8525	260	1.1433	-
0.8590	262	1.1088	-
0.8656	264	1.1384	-
0.8721	266	1.0962	-
0.8787	268	1.1878	-
0.8852	270	1.0113	-
0.8918	272	1.1411	-
0.8984	274	1.0289	-
0.9049	276	1.0163	-
0.9115	278	1.2859	-
0.9180	280	0.9449	-
0.9246	282	1.0941	-
0.9311	284	1.0908	-
0.9377	286	1.1028	-
0.9443	288	1.0633	-
0.9508	290	1.1004	-
0.9574	292	1.0483	-
0.9639	294	1.0064	-
0.9705	296	1.0088	-
0.9770	298	1.0068	-
0.9836	300	1.1903	-
0.9902	302	0.9401	-
0.9967	304	0.8369	-
1.0033	306	0.5046	-
1.0098	308	1.0626	0.8660
1.0164	310	0.9587	-
1.0230	312	1.0565	-
1.0295	314	1.1329	-
1.0361	316	1.1857	-
1.0426	318	0.9777	-
1.0492	320	0.9883	-
1.0557	322	0.9076	-
1.0623	324	0.7942	-
1.0689	326	1.1952	-
1.0754	328	0.9726	-
1.0820	330	1.0663	-
1.0885	332	1.0337	-
1.0951	334	0.9522	-
1.1016	336	0.9813	-
1.1082	338	0.9057	-
1.1148	340	1.0076	-
1.1213	342	0.8557	-
1.1279	344	0.9341	-
1.1344	346	0.9188	-
1.1410	348	1.091	-
1.1475	350	0.8205	-
1.1541	352	1.0509	-
1.1607	354	0.9201	-
1.1672	356	1.0741	-
1.1738	358	0.8662	-
1.1803	360	0.9468	-
1.1869	362	0.8604	-
1.1934	364	0.8141	-
1.2	366	0.9475	-
1.2066	368	0.8407	-
1.2131	370	0.764	-
1.2197	372	0.798	-
1.2262	374	0.8205	-
1.2328	376	0.7995	-
1.2393	378	0.9305	-
1.2459	380	0.858	-
1.2525	382	0.8465	-
1.2590	384	0.7691	-
1.2623	385	-	0.7879
1.2656	386	1.0073	-
1.2721	388	0.8026	-
1.2787	390	0.8108	-
1.2852	392	0.7783	-
1.2918	394	0.8766	-
1.2984	396	0.8576	-
1.3049	398	0.884	-
1.3115	400	0.9547	-
1.3180	402	0.9231	-
1.3246	404	0.8027	-
1.3311	406	0.9117	-
1.3377	408	0.7743	-
1.3443	410	0.8257	-
1.3508	412	0.8738	-
1.3574	414	0.972	-
1.3639	416	0.8297	-
1.3705	418	0.8941	-
1.3770	420	0.8513	-
1.3836	422	0.7588	-
1.3902	424	0.8332	-
1.3967	426	0.7682	-
1.4033	428	0.7916	-
1.4098	430	0.9519	-
1.4164	432	1.0526	-
1.4230	434	0.8724	-
1.4295	436	0.8267	-
1.4361	438	0.7672	-
1.4426	440	0.7977	-
1.4492	442	0.6947	-
1.4557	444	0.9042	-
1.4623	446	0.8971	-
1.4689	448	0.9655	-
1.4754	450	0.8512	-
1.4820	452	0.9421	-
1.4885	454	0.9501	-
1.4951	456	0.8214	-
1.5016	458	0.9335	-
1.5082	460	0.7617	-
1.5148	462	0.8601	0.7855
1.5213	464	0.757	-
1.5279	466	0.7389	-
1.5344	468	0.8146	-
1.5410	470	0.9235	-
1.5475	472	0.9901	-
1.5541	474	0.9624	-
1.5607	476	0.8909	-
1.5672	478	0.7276	-
1.5738	480	0.9444	-
1.5803	482	0.874	-
1.5869	484	0.7985	-
1.5934	486	0.9335	-
1.6	488	0.8108	-
1.6066	490	0.7779	-
1.6131	492	0.8807	-
1.6197	494	0.8146	-
1.6262	496	0.9218	-
1.6328	498	0.8439	-
1.6393	500	0.7348	-
1.6459	502	0.8533	-
1.6525	504	0.7695	-
1.6590	506	0.7911	-
1.6656	508	0.837	-
1.6721	510	0.731	-
1.6787	512	0.911	-
1.6852	514	0.7963	-
1.6918	516	0.7719	-
1.6984	518	0.8011	-
1.7049	520	0.7428	-
1.7115	522	0.8159	-
1.7180	524	0.7833	-
1.7246	526	0.7934	-
1.7311	528	0.7854	-
1.7377	530	0.8398	-
1.7443	532	0.7875	-
1.7508	534	0.7282	-
1.7574	536	0.8269	-
1.7639	538	0.8033	-
1.7672	539	-	0.7595
1.7705	540	0.9471	-
1.7770	542	0.941	-
1.7836	544	0.725	-
1.7902	546	0.8978	-
1.7967	548	0.8361	-
1.8033	550	0.7092	-
1.8098	552	0.809	-
1.8164	554	0.9399	-
1.8230	556	0.769	-
1.8295	558	0.7381	-
1.8361	560	0.7554	-
1.8426	562	0.8553	-
1.8492	564	0.919	-
1.8557	566	0.7479	-
1.8623	568	0.8381	-
1.8689	570	0.7911	-
1.8754	572	0.8076	-
1.8820	574	0.7868	-
1.8885	576	0.9147	-
1.8951	578	0.7271	-
1.9016	580	0.7201	-
1.9082	582	0.7538	-
1.9148	584	0.7522	-
1.9213	586	0.7737	-
1.9279	588	0.7187	-
1.9344	590	0.8713	-
1.9410	592	0.7971	-
1.9475	594	0.8226	-
1.9541	596	0.7074	-
1.9607	598	0.804	-
1.9672	600	0.7259	-
1.9738	602	0.7758	-
1.9803	604	0.8209	-
1.9869	606	0.7918	-
1.9934	608	0.7467	-
2.0	610	0.4324	-

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.0.1
Transformers: 4.42.3
PyTorch: 2.1.1+cu121
Accelerate: 0.31.0
Datasets: 2.17.1
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Trelis
/

ms-marco-MiniLM-L-6-v2-2-cst-ep-MNRLtriplets-2e-5-batch32-gpu-overlap

SentenceTransformer based on cross-encoder/ms-marco-MiniLM-L-6-v2

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Training Details

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

MultipleNegativesRankingLoss

Finetuned from

SentenceTransformer based on cross-encoder/ms-marco-MiniLM-L-6-v2

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Training Details

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

MultipleNegativesRankingLoss

Finetuned from cross-encoder/ms-marco-MiniLM-L-6-v2

Finetuned from