Matryoshka Representation Learning
Paper • 2205.13147 • Published • 27
How to use calvin2258000/test2 with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("calvin2258000/test2")
sentences = [
"| :-- | :--: | :--: || After the coming into force date | 1.00 | 1.00 || 6 April 2014 - coming into force date | 1.30 | 1.30 || 1 Oct 2010 - 5 April 2014 | 1.40 | 1.40 || 6 April 2006 - 30 Sept 2010 | 1.67 | 1.67 || Pre 6 April 2006 | 1.67 | 1.67 |",
"What are the applicable building regulation change factors for different time periods starting from pre-6 April 2006 to the current date?",
"When are protected lobbies or corridors required for staircases in buildings with multiple storeys above ground level?",
"What are the standards for repairing, reconstructing, and altering existing drains and sewers?"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("calvin2258000/test2")
# Run inference
sentences = [
'Appendix A:Key terms ..... 11Appendix B:Standards referred to ..... 12# Approved Document P:Electrical safety Dwellings## Summary0.1 This approved document gives guidance on how to comply with Part P of the Building Regulations.It contains the following sections:Section 1:Technical requirements for electrical work in dwellingsSection 2:The types of building and electrical installation within the scope of Part P, and the types of electrical work that are notifiableSection 3:The different procedures that may be followed to show that electrical work complies with Part PAppendix A:Key terms',
'What guidance does Approved Document P provide for complying with Part P of the Building Regulations regarding electrical work in dwellings?',
"What conditions classify a district heat network as 'under construction' based on the building regulations defined on 15 June 2022?",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
dim_768, dim_512, dim_256, dim_128 and dim_64InformationRetrievalEvaluator| Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
|---|---|---|---|---|---|
| cosine_accuracy@1 | 0.6569 | 0.6569 | 0.6472 | 0.6107 | 0.545 |
| cosine_accuracy@3 | 0.8443 | 0.8443 | 0.837 | 0.8127 | 0.7494 |
| cosine_accuracy@5 | 0.8905 | 0.8881 | 0.8881 | 0.8516 | 0.7932 |
| cosine_accuracy@10 | 0.9367 | 0.944 | 0.944 | 0.9294 | 0.8637 |
| cosine_precision@1 | 0.6569 | 0.6569 | 0.6472 | 0.6107 | 0.545 |
| cosine_precision@3 | 0.2814 | 0.2814 | 0.279 | 0.2709 | 0.2498 |
| cosine_precision@5 | 0.1781 | 0.1776 | 0.1776 | 0.1703 | 0.1586 |
| cosine_precision@10 | 0.0937 | 0.0944 | 0.0944 | 0.0929 | 0.0864 |
| cosine_recall@1 | 0.6569 | 0.6569 | 0.6472 | 0.6107 | 0.545 |
| cosine_recall@3 | 0.8443 | 0.8443 | 0.837 | 0.8127 | 0.7494 |
| cosine_recall@5 | 0.8905 | 0.8881 | 0.8881 | 0.8516 | 0.7932 |
| cosine_recall@10 | 0.9367 | 0.944 | 0.944 | 0.9294 | 0.8637 |
| cosine_ndcg@10 | 0.8031 | 0.8037 | 0.7995 | 0.7725 | 0.7081 |
| cosine_mrr@10 | 0.7597 | 0.7586 | 0.7528 | 0.7223 | 0.6579 |
| cosine_map@100 | 0.7624 | 0.7605 | 0.755 | 0.7254 | 0.6636 |
positive and anchor| positive | anchor | |
|---|---|---|
| type | string | string |
| details |
|
|
| positive | anchor |
|---|---|
# Section 3:Subsoil drainage3.1 The provisions which follow assume that the site of the building is not subject to general flooding (see paragraph 0.8 ) or, if it is, that appropriate steps are being taken.3.2 Where the water table can rise to within 0.25 m of the lowest floor of the building, or where surface water could enter or adversely affect the building, either the ground to be covered by the building should be drained by gravity, or other effective means of safeguarding the building should be taken.3.3 If an active subsoil drain is cut during excavation and if it passes under the building it should be:a. re-laid in pipes with sealed joints and have access points outside the building; orb. re-routed around the building; orc. re-run to another outfall (see Diagram 3).3.4 Where there is a risk that groundwater beneath or around the building could adversely affect the stability and properties of the ground, consideration should be given to site drainage or other protection (see Sec... |
What measures should be taken if a building site has a water table that can rise to within 0.25 meters of the lowest floor? |
Easily accessibleEither:- a window or doorway, any part of which is within 2 m vertically of an accessible level surface such as the ground or basement level, or an access balcony, or- a window within 2 m vertically of a flat or sloping roof (with a pitch of less than $30^{\circ}$ ) that is within 3.5 m of ground level.Coupled assemblyA doorset and window that are supplied as separate self-contained frames and fixed together on site. |
What criteria determine if a window or doorway is considered easily accessible in a building? |
Fuels such as bituminous coal, untreated wood or compressed paper are not smokeless or low-volatiles fuels.3.These appliances are known as 'exempted fireplaces'.2.7 For fireplaces with openings larger than $500 \mathrm{ |
What is the required flue cross-sectional area for fireplaces with openings larger than 500 mm x 550 mm or exposed on multiple sides? |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
eval_strategy: epochper_device_train_batch_size: 32per_device_eval_batch_size: 16gradient_accumulation_steps: 16learning_rate: 2e-05num_train_epochs: 10lr_scheduler_type: cosinewarmup_ratio: 0.1bf16: Truetf32: Trueload_best_model_at_end: Trueoptim: adamw_torch_fusedbatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: epochprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 16eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 10max_steps: -1lr_scheduler_type: cosinelr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Truelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size: 0fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
|---|---|---|---|---|---|---|---|
| 1.0 | 8 | - | 0.7456 | 0.7485 | 0.7467 | 0.7117 | 0.6394 |
| 1.2759 | 10 | 33.8924 | - | - | - | - | - |
| 2.0 | 16 | - | 0.7781 | 0.7773 | 0.7785 | 0.7466 | 0.6837 |
| 2.5517 | 20 | 10.7256 | - | - | - | - | - |
| 3.0 | 24 | - | 0.7935 | 0.7870 | 0.7888 | 0.7535 | 0.7016 |
| 3.8276 | 30 | 5.5408 | - | - | - | - | - |
| 4.0 | 32 | - | 0.8000 | 0.7962 | 0.7969 | 0.7585 | 0.7082 |
| 5.0 | 40 | 3.4556 | 0.8017 | 0.8011 | 0.7992 | 0.7644 | 0.7082 |
| 6.0 | 48 | - | 0.8037 | 0.8021 | 0.7974 | 0.7692 | 0.7082 |
| 6.2759 | 50 | 2.9963 | - | - | - | - | - |
| 7.0 | 56 | - | 0.8025 | 0.8013 | 0.7987 | 0.7719 | 0.7072 |
| 7.5517 | 60 | 3.1681 | - | - | - | - | - |
| 8.0 | 64 | - | 0.8035 | 0.8024 | 0.7996 | 0.7723 | 0.7077 |
| 8.8276 | 70 | 2.5551 | 0.8031 | 0.8037 | 0.7995 | 0.7725 | 0.7081 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
answerdotai/ModernBERT-base