SentenceTransformer based on answerdotai/ModernBERT-large

This is a sentence-transformers model finetuned from answerdotai/ModernBERT-large. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: answerdotai/ModernBERT-large
Maximum Sequence Length: 8192 tokens
Output Dimensionality: 1024 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("BlackBeenie/ModernBERT-large-msmarco-v3-bpr")
# Run inference
sentences = [
    'what county is phillips wi',
    'Phillips is a city in Price County, Wisconsin, United States. The population was 1,675 at the 2000 census. It is the county seat of Price County. Phillips is located at 45Â°41â\x80²30â\x80³N 90Â°24â\x80²7â\x80³W / 45.69167Â°N 90.40194Â°W / 45.69167; -90.40194 (45.691560, -90.401915). It is on highway SR 13, 77 miles north of Marshfield, and 74 miles south of Ashland.',
    "Motto: It's not what you show, it's what you grow.. Location within Phillips County and Colorado. Holyoke is the Home Rule Municipality that is the county seat and the most populous municipality of Phillips County, Colorado, United States. The city population was 2,313 at the 2010 census.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

Size: 498,970 training samples
Columns: sentence_0, sentence_1, and sentence_2

Approximate statistics based on the first 1000 samples:

	sentence_0	sentence_1	sentence_2
type	string	string	string
details	min: 4 tokens mean: 9.24 tokens max: 27 tokens	min: 23 tokens mean: 83.71 tokens max: 279 tokens	min: 16 tokens mean: 80.18 tokens max: 262 tokens

Samples:

sentence_0	sentence_1	sentence_2
`what is tongkat ali`	`Tongkat Ali is a very powerful herb that acts as a sex enhancer by naturally increasing the testosterone levels, and revitalizing sexual impotence, performance and pleasure. Tongkat Ali is also effective in building muscular volume & strength resulting to a healthy physique.`	`However, unlike tongkat ali extract, tongkat ali chipped root and root powder are not sterile. Thus, the raw consumption of root powder is not recommended. The traditional preparation in Indonesia and Malaysia is to boil chipped roots as a tea.`
`cost to install engineered hardwood flooring`	`Burton says his customers typically spend about $8 per square foot for engineered hardwood flooring; add an additional $2 per square foot for installation. Minion says consumers should expect to pay $7 to $12 per square foot for quality hardwood flooring. âIf the homeowner buys the wood and you need somebody to install it, usually an installation goes for about $2 a square foot,â Bill LeBeau, owner of LeBeauâs Hardwood Floors of Huntersville, North Carolina, says.`	`Engineered Wood Flooring Installation - Average Cost Per Square Foot. Expect to pay in the higher end of the price range for a licensed, insured and reputable pro - and for complex or rush projects. To lower Engineered Wood Flooring Installation costs: combine related projects, minimize options/extras and be flexible about project scheduling.`
`define pollute`	`pollutes; polluted; polluting. Learner's definition of POLLUTE. [+ object] : to make (land, water, air, etc.) dirty and not safe or suitable to use. Waste from the factory had polluted [=contaminated] the river. Miles of beaches were polluted by the oil spill. Car exhaust pollutes the air.`	Chemical water pollution. Industrial and agricultural work involves the use of many different chemicals that can run-off into water and pollute it.1 Metals and solvents from industrial work can pollute rivers and lakes.2 These are poisonous to many forms of aquatic life and may slow their development, make them infertile or even result in death.ndustrial and agricultural work involves the use of many different chemicals that can run-off into water and pollute it. 1 Metals and solvents from industrial work can pollute rivers and lakes.

Loss: beir.losses.bpr_loss.BPRLoss

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
num_train_epochs: 6
multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1
num_train_epochs: 6
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.0
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: round_robin

Training Logs

Epoch	Step	Training Loss
0.0641	500	1.4036
0.1283	1000	0.36
0.1924	1500	0.3305
0.2565	2000	0.2874
0.3206	2500	0.2732
0.3848	3000	0.2446
0.4489	3500	0.2399
0.5130	4000	0.2302
0.5771	4500	0.231
0.6413	5000	0.2217
0.7054	5500	0.2192
0.7695	6000	0.2087
0.8337	6500	0.2104
0.8978	7000	0.2069
0.9619	7500	0.2071
1.0	7797	-
1.0260	8000	0.1663
1.0902	8500	0.1213
1.1543	9000	0.1266
1.2184	9500	0.1217
1.2825	10000	0.1193
1.3467	10500	0.1198
1.4108	11000	0.1258
1.4749	11500	0.1266
1.5391	12000	0.1334
1.6032	12500	0.1337
1.6673	13000	0.1258
1.7314	13500	0.1268
1.7956	14000	0.1249
1.8597	14500	0.1256
1.9238	15000	0.1238
1.9879	15500	0.1274
2.0	15594	-
2.0521	16000	0.0776
2.1162	16500	0.0615
2.1803	17000	0.0647
2.2445	17500	0.0651
2.3086	18000	0.0695
2.3727	18500	0.0685
2.4368	19000	0.0685
2.5010	19500	0.0707
2.5651	20000	0.073
2.6292	20500	0.0696
2.6933	21000	0.0694
2.7575	21500	0.0701
2.8216	22000	0.0668
2.8857	22500	0.07
2.9499	23000	0.0649
3.0	23391	-
3.0140	23500	0.0589
3.0781	24000	0.0316
3.1422	24500	0.0377
3.2064	25000	0.039
3.2705	25500	0.0335
3.3346	26000	0.0387
3.3987	26500	0.0367
3.4629	27000	0.0383
3.5270	27500	0.0407
3.5911	28000	0.0372
3.6553	28500	0.0378
3.7194	29000	0.0359
3.7835	29500	0.0394
3.8476	30000	0.0388
3.9118	30500	0.0422
3.9759	31000	0.0391
4.0	31188	-
4.0400	31500	0.0251
4.1041	32000	0.0199
4.1683	32500	0.0261
4.2324	33000	0.021
4.2965	33500	0.0196
4.3607	34000	0.0181
4.4248	34500	0.0228
4.4889	35000	0.0195
4.5530	35500	0.02
4.6172	36000	0.0251
4.6813	36500	0.0213
4.7454	37000	0.0208
4.8095	37500	0.0192
4.8737	38000	0.0204
4.9378	38500	0.0176
5.0	38985	-
5.0019	39000	0.0184
5.0661	39500	0.0136
5.1302	40000	0.0102
5.1943	40500	0.0122
5.2584	41000	0.0124
5.3226	41500	0.013
5.3867	42000	0.0105
5.4508	42500	0.0135
5.5149	43000	0.0158
5.5791	43500	0.015
5.6432	44000	0.0128
5.7073	44500	0.0105
5.7715	45000	0.014
5.8356	45500	0.0125
5.8997	46000	0.0139
5.9638	46500	0.0137
6.0	46782	-

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.3.1
Transformers: 4.48.0.dev0
PyTorch: 2.5.1+cu121
Accelerate: 1.2.1
Datasets: 3.2.0
Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

BlackBeenie
/

ModernBERT-large-msmarco-v3-bpr