ModernBERT Embed base Legal Matryoshka

This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: nomic-ai/modernbert-embed-base
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("calvin2258000/test2")
# Run inference
sentences = [
    'Appendix A:Key terms ..... 11Appendix B:Standards referred to ..... 12# Approved Document P:Electrical safety Dwellings## Summary0.1 This approved document gives guidance on how to comply with Part P of the Building Regulations.It contains the following sections:Section 1:Technical requirements for electrical work in dwellingsSection 2:The types of building and electrical installation within the scope of Part P, and the types of electrical work that are notifiableSection 3:The different procedures that may be followed to show that electrical work complies with Part PAppendix A:Key terms',
    'What guidance does Approved Document P provide for complying with Part P of the Building Regulations regarding electrical work in dwellings?',
    "What conditions classify a district heat network as 'under construction' based on the building regulations defined on 15 June 2022?",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric dim_768 dim_512 dim_256 dim_128 dim_64
cosine_accuracy@1 0.6569 0.6569 0.6472 0.6107 0.545
cosine_accuracy@3 0.8443 0.8443 0.837 0.8127 0.7494
cosine_accuracy@5 0.8905 0.8881 0.8881 0.8516 0.7932
cosine_accuracy@10 0.9367 0.944 0.944 0.9294 0.8637
cosine_precision@1 0.6569 0.6569 0.6472 0.6107 0.545
cosine_precision@3 0.2814 0.2814 0.279 0.2709 0.2498
cosine_precision@5 0.1781 0.1776 0.1776 0.1703 0.1586
cosine_precision@10 0.0937 0.0944 0.0944 0.0929 0.0864
cosine_recall@1 0.6569 0.6569 0.6472 0.6107 0.545
cosine_recall@3 0.8443 0.8443 0.837 0.8127 0.7494
cosine_recall@5 0.8905 0.8881 0.8881 0.8516 0.7932
cosine_recall@10 0.9367 0.944 0.944 0.9294 0.8637
cosine_ndcg@10 0.8031 0.8037 0.7995 0.7725 0.7081
cosine_mrr@10 0.7597 0.7586 0.7528 0.7223 0.6579
cosine_map@100 0.7624 0.7605 0.755 0.7254 0.6636

Training Details

Training Dataset

Unnamed Dataset

  • Size: 3,692 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 3 tokens
    • mean: 176.61 tokens
    • max: 742 tokens
    • min: 12 tokens
    • mean: 23.56 tokens
    • max: 51 tokens
  • Samples:
    positive anchor
    # Section 3:Subsoil drainage3.1 The provisions which follow assume that the site of the building is not subject to general flooding (see paragraph 0.8 ) or, if it is, that appropriate steps are being taken.3.2 Where the water table can rise to within 0.25 m of the lowest floor of the building, or where surface water could enter or adversely affect the building, either the ground to be covered by the building should be drained by gravity, or other effective means of safeguarding the building should be taken.3.3 If an active subsoil drain is cut during excavation and if it passes under the building it should be:a. re-laid in pipes with sealed joints and have access points outside the building; orb. re-routed around the building; orc. re-run to another outfall (see Diagram 3).3.4 Where there is a risk that groundwater beneath or around the building could adversely affect the stability and properties of the ground, consideration should be given to site drainage or other protection (see Sec... What measures should be taken if a building site has a water table that can rise to within 0.25 meters of the lowest floor?
    Easily accessibleEither:- a window or doorway, any part of which is within 2 m vertically of an accessible level surface such as the ground or basement level, or an access balcony, or- a window within 2 m vertically of a flat or sloping roof (with a pitch of less than $30^{\circ}$ ) that is within 3.5 m of ground level.Coupled assemblyA doorset and window that are supplied as separate self-contained frames and fixed together on site. What criteria determine if a window or doorway is considered easily accessible in a building?
    Fuels such as bituminous coal, untreated wood or compressed paper are not smokeless or low-volatiles fuels.3.These appliances are known as 'exempted fireplaces'.2.7 For fireplaces with openings larger than $500 \mathrm{mm} \times 550 \mathrm{mm}$ or fireplaces exposed on two or more sides (such as a fireplace under a canopy or open on both sides of a central chimney breast) a way of showing compliance would be to provide a flue with a cross-sectional area equal to 15 per cent of the total face area of the fireplace opening(s) (see Appendix B).However, specialist advice should be sought when proposing to construct flues having an area of:a. more than 15 per cent of the total face area of the fireplace openings; or What is the required flue cross-sectional area for fireplaces with openings larger than 500 mm x 550 mm or exposed on multiple sides?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 10
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
1.0 8 - 0.7456 0.7485 0.7467 0.7117 0.6394
1.2759 10 33.8924 - - - - -
2.0 16 - 0.7781 0.7773 0.7785 0.7466 0.6837
2.5517 20 10.7256 - - - - -
3.0 24 - 0.7935 0.7870 0.7888 0.7535 0.7016
3.8276 30 5.5408 - - - - -
4.0 32 - 0.8000 0.7962 0.7969 0.7585 0.7082
5.0 40 3.4556 0.8017 0.8011 0.7992 0.7644 0.7082
6.0 48 - 0.8037 0.8021 0.7974 0.7692 0.7082
6.2759 50 2.9963 - - - - -
7.0 56 - 0.8025 0.8013 0.7987 0.7719 0.7072
7.5517 60 3.1681 - - - - -
8.0 64 - 0.8035 0.8024 0.7996 0.7723 0.7077
8.8276 70 2.5551 0.8031 0.8037 0.7995 0.7725 0.7081
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.2
  • Sentence Transformers: 3.4.1
  • Transformers: 4.50.0
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.4.1
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
2
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for calvin2258000/test2

Finetuned
(111)
this model

Papers for calvin2258000/test2

Evaluation results