BlackBeenie's picture
Add new SentenceTransformer model
e011bde verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:498970
  - loss:BPRLoss
base_model: answerdotai/ModernBERT-large
widget:
  - source_sentence: lang last name
    sentences:
      - >-
        Lang is a moderately common surname in the United States. When the
        United States Census was taken in 2010, there were about 61,529
        individuals with the last name Lang, ranking it number 545 for all
        surnames. Historically, the name has been most prevalent in the Midwest,
        especially in North Dakota. Lang is least common in the southeastern
        states.
      - >-
        Flood Warning ...The National Weather Service in Houston/Galveston has
        issued a flood warning for the following rivers... Long King Creek At
        Livingston affecting the following counties in Texas... Polk...San
        Jacinto For the Long King Creek, at Livingston, Minor flooding is
        occuring and is expected to continue.
      - "Langston Name Meaning. English (mainly West Midlands): habitational name from any of various places, for example Langstone in Devon and Hampshire, named with Old English lang â\x80\x98longâ\x80\x99, â\x80\x98tallâ\x80\x99 + stan â\x80\x98stoneâ\x80\x99, i.e. a menhir."
  - source_sentence: average salary of a program manager in healthcare
    sentences:
      - >-
        What is the average annual salary for Compliance Manager-Healthcare? The
        annual salary for someone with the job title Compliance
        Manager-Healthcare may vary depending on a number of factors including
        industry, company size, location, years of experience and level of
        education.or example the median expected annual pay for a typical
        Compliance Manager-Healthcare in the United States is $92,278 so 50% of
        the people who perform the job of Compliance Manager-Healthcare in the
        United States are expected to make less than $92,278. Source: HR
        Reported data as of October 2015.
      - >-
        Average Program Manager Healthcare Salaries. The average salary for
        program manager healthcare jobs is $62,000. Average program manager
        healthcare salaries can vary greatly due to company, location, industry,
        experience and benefits. This salary was calculated using the average
        salary for all jobs with the term program manager healthcare anywhere in
        the job listing.
      - >-
        To apply for your IDNYC card, please follow these simple steps: Confirm
        you have the correct documents to apply. The IDNYC program uses a point
        system to determine if applicants are able to prove identity and
        residency in New York City. You will need three points worth of
        documents to prove your identity and a one point document to prove your
        residency.
  - source_sentence: when did brad paisley she's everything to me come out
    sentences:
      - >-
        Jump to: Overview (3) | Mini Bio (1) | Spouse (1) | Trivia (16) |
        Personal Quotes (59) Brad Paisley was born on October 28, 1972 in Glen
        Dale, West Virginia, USA as Brad Douglas Paisley. He has been married to
        Kimberly Williams-Paisley since March 15, 2003. They have two children.
      - >-
        A parasitic disease is an infectious disease caused or transmitted by a
        parasite. Many parasites do not cause diseases. Parasitic diseases can
        affect practically all living organisms, including plants and mammals.
        The study of parasitic diseases is called parasitology.erminology
        [edit]. Although organisms such as bacteria function as parasites, the
        usage of the term parasitic disease is usually more restricted. The
        three main types of organisms causing these conditions are protozoa
        (causing protozoan infection), helminths (helminthiasis), and
        ectoparasites.
      - >-
        She's Everything. She's Everything is a song co-written and recorded by
        American country music artist Brad Paisley. It reached the top of the
        Billboard Hot Country Songs Chart. It was released in August 2006 as the
        fourth and final single from Paisley's album Time Well Wasted. It was
        Paisley's seventh number one single.
  - source_sentence: who did lynda carter voice in elder scrolls
    sentences:
      - >-
        By Wade Steel. Bethesda Softworks announced today that actress Lynda
        Carter will join the voice cast for to its upcoming epic RPG The Elder
        Scrolls IV: Oblivion. The actress, best known for her television role as
        Wonder Woman, had previously provided her vocal talents for Elder
        Scrolls III: Morrowind and its Bloodmoon expansion.
      - "revise verb (STUDY). B1 [I or T] UK (US review) to â\x80\x8Bstudy again something you have already â\x80\x8Blearned, in â\x80\x8Bpreparation for an â\x80\x8Bexam: We're revising (â\x80\x8Balgebra) for the â\x80\x8Btest â\x80\x8Btomorrow. (Definition of revise from the Cambridge Advanced Learnerâ\x80\x99s Dictionary & Thesaurus © Cambridge University Press)."
      - >-
        Lynda Carter (born Linda Jean Córdova Carter; July 24, 1951) is an
        American actress, singer, songwriter and beauty pageant titleholder who
        was crowned Miss World America 1972 and also the star of the TV series
        Wonder Woman from 1975 to 1979.
  - source_sentence: what county is phillips wi
    sentences:
      - >-
        Motto: It's not what you show, it's what you grow.. Location within
        Phillips County and Colorado. Holyoke is the Home Rule Municipality that
        is the county seat and the most populous municipality of Phillips
        County, Colorado, United States. The city population was 2,313 at the
        2010 census.
      - "Phillips is a city in Price County, Wisconsin, United States. The population was 1,675 at the 2000 census. It is the county seat of Price County. Phillips is located at 45°41â\x80²30â\x80³N 90°24â\x80²7â\x80³W / 45.69167°N 90.40194°W / 45.69167; -90.40194 (45.691560, -90.401915). It is on highway SR 13, 77 miles north of Marshfield, and 74 miles south of Ashland."
      - >-
        Various spellings from the numerous languages for Miller include
        Mueller, Mahler, Millar, Molenaar, Mills, Moeller, and Mullar. In
        Italian the surname is spelled Molinaro and in Spanish it is Molinero.
        The surname of Miller is most common in England, Scotland, United
        States, Germany, Spain and Italy. In the United States the name is
        seventh most common surname in the country.
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on answerdotai/ModernBERT-large

This is a sentence-transformers model finetuned from answerdotai/ModernBERT-large. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: answerdotai/ModernBERT-large
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("BlackBeenie/ModernBERT-large-msmarco-v3-bpr")
# Run inference
sentences = [
    'what county is phillips wi',
    'Phillips is a city in Price County, Wisconsin, United States. The population was 1,675 at the 2000 census. It is the county seat of Price County. Phillips is located at 45°41â\x80²30â\x80³N 90°24â\x80²7â\x80³W / 45.69167°N 90.40194°W / 45.69167; -90.40194 (45.691560, -90.401915). It is on highway SR 13, 77 miles north of Marshfield, and 74 miles south of Ashland.',
    "Motto: It's not what you show, it's what you grow.. Location within Phillips County and Colorado. Holyoke is the Home Rule Municipality that is the county seat and the most populous municipality of Phillips County, Colorado, United States. The city population was 2,313 at the 2010 census.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 498,970 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 4 tokens
    • mean: 9.24 tokens
    • max: 27 tokens
    • min: 23 tokens
    • mean: 83.71 tokens
    • max: 279 tokens
    • min: 16 tokens
    • mean: 80.18 tokens
    • max: 262 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    what is tongkat ali Tongkat Ali is a very powerful herb that acts as a sex enhancer by naturally increasing the testosterone levels, and revitalizing sexual impotence, performance and pleasure. Tongkat Ali is also effective in building muscular volume & strength resulting to a healthy physique. However, unlike tongkat ali extract, tongkat ali chipped root and root powder are not sterile. Thus, the raw consumption of root powder is not recommended. The traditional preparation in Indonesia and Malaysia is to boil chipped roots as a tea.
    cost to install engineered hardwood flooring Burton says his customers typically spend about $8 per square foot for engineered hardwood flooring; add an additional $2 per square foot for installation. Minion says consumers should expect to pay $7 to $12 per square foot for quality hardwood flooring. “If the homeowner buys the wood and you need somebody to install it, usually an installation goes for about $2 a square foot,” Bill LeBeau, owner of LeBeau’s Hardwood Floors of Huntersville, North Carolina, says. Engineered Wood Flooring Installation - Average Cost Per Square Foot. Expect to pay in the higher end of the price range for a licensed, insured and reputable pro - and for complex or rush projects. To lower Engineered Wood Flooring Installation costs: combine related projects, minimize options/extras and be flexible about project scheduling.
    define pollute pollutes; polluted; polluting. Learner's definition of POLLUTE. [+ object] : to make (land, water, air, etc.) dirty and not safe or suitable to use. Waste from the factory had polluted [=contaminated] the river. Miles of beaches were polluted by the oil spill. Car exhaust pollutes the air. Chemical water pollution. Industrial and agricultural work involves the use of many different chemicals that can run-off into water and pollute it.1 Metals and solvents from industrial work can pollute rivers and lakes.2 These are poisonous to many forms of aquatic life and may slow their development, make them infertile or even result in death.ndustrial and agricultural work involves the use of many different chemicals that can run-off into water and pollute it. 1 Metals and solvents from industrial work can pollute rivers and lakes.
  • Loss: beir.losses.bpr_loss.BPRLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 6
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 6
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.0641 500 1.4036
0.1283 1000 0.36
0.1924 1500 0.3305
0.2565 2000 0.2874
0.3206 2500 0.2732
0.3848 3000 0.2446
0.4489 3500 0.2399
0.5130 4000 0.2302
0.5771 4500 0.231
0.6413 5000 0.2217
0.7054 5500 0.2192
0.7695 6000 0.2087
0.8337 6500 0.2104
0.8978 7000 0.2069
0.9619 7500 0.2071
1.0 7797 -
1.0260 8000 0.1663
1.0902 8500 0.1213
1.1543 9000 0.1266
1.2184 9500 0.1217
1.2825 10000 0.1193
1.3467 10500 0.1198
1.4108 11000 0.1258
1.4749 11500 0.1266
1.5391 12000 0.1334
1.6032 12500 0.1337
1.6673 13000 0.1258
1.7314 13500 0.1268
1.7956 14000 0.1249
1.8597 14500 0.1256
1.9238 15000 0.1238
1.9879 15500 0.1274
2.0 15594 -
2.0521 16000 0.0776
2.1162 16500 0.0615
2.1803 17000 0.0647
2.2445 17500 0.0651
2.3086 18000 0.0695
2.3727 18500 0.0685
2.4368 19000 0.0685
2.5010 19500 0.0707
2.5651 20000 0.073
2.6292 20500 0.0696
2.6933 21000 0.0694
2.7575 21500 0.0701
2.8216 22000 0.0668
2.8857 22500 0.07
2.9499 23000 0.0649
3.0 23391 -
3.0140 23500 0.0589
3.0781 24000 0.0316
3.1422 24500 0.0377
3.2064 25000 0.039
3.2705 25500 0.0335
3.3346 26000 0.0387
3.3987 26500 0.0367
3.4629 27000 0.0383
3.5270 27500 0.0407
3.5911 28000 0.0372
3.6553 28500 0.0378
3.7194 29000 0.0359
3.7835 29500 0.0394
3.8476 30000 0.0388
3.9118 30500 0.0422
3.9759 31000 0.0391
4.0 31188 -
4.0400 31500 0.0251
4.1041 32000 0.0199
4.1683 32500 0.0261
4.2324 33000 0.021
4.2965 33500 0.0196
4.3607 34000 0.0181
4.4248 34500 0.0228
4.4889 35000 0.0195
4.5530 35500 0.02
4.6172 36000 0.0251
4.6813 36500 0.0213
4.7454 37000 0.0208
4.8095 37500 0.0192
4.8737 38000 0.0204
4.9378 38500 0.0176
5.0 38985 -
5.0019 39000 0.0184
5.0661 39500 0.0136
5.1302 40000 0.0102
5.1943 40500 0.0122
5.2584 41000 0.0124
5.3226 41500 0.013
5.3867 42000 0.0105
5.4508 42500 0.0135
5.5149 43000 0.0158
5.5791 43500 0.015
5.6432 44000 0.0128
5.7073 44500 0.0105
5.7715 45000 0.014
5.8356 45500 0.0125
5.8997 46000 0.0139
5.9638 46500 0.0137
6.0 46782 -

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.1
  • Transformers: 4.48.0.dev0
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}