Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 15
How to use nancy-noubou/bge-base-iso-clauses-v1 with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nancy-noubou/bge-base-iso-clauses-v1")
sentences = [
"Represent this sentence for searching relevant passages: The organization shall conduct internal audits at planned intervals to provide information on whether the information security management system: a) conforms to 1) the organization’s own requirements for its information security management system; 2) the requirements of this document; b) is effectively implemented and maintained.",
"Title: Stratos Inventory Drone X1 Maintenance Procedure. Effective Date: 2023-09-15. Owner: Marc Petit, Ops Manager. Purpose: To provide guidelines for the routine maintenance of the Stratos Inventory Drone X1 to maximize operational efficiency. Scope: This procedure applies to the maintenance team and involves inspections, cleanings, and repairs. Process: 1) Daily Inspection. Check the physical condition of the drone, including propellers, battery, and sensors. Document findings using the Daily Maintenance Log (DML-2023). 2) Cleaning. Remove dust and debris from all surfaces with a soft cloth and appropriate cleaning agents. For the sensor, ensure there are no obstructions; clean with a microfiber cloth. 3) Weekly Review. Conduct a more thorough inspection every week. Examine internal components for wear and tear. Any significant findings must be reported to QA for evaluation. 4) Annual Overhaul. An extensive inspection should be performed every 12 months, where all parts are evaluated and replaced as necessary. Results are stored in the Annual Review Document (ARD-2024) for historical tracking.",
"Title: Stratos Inventory Drone X1 Operator Training Course. Date: 2023-10-15. Conducted by: Emily Rios, HR Director. Attendees: 15 new operators from various departments. Course Outline: 1) Overview of drone capabilities and functionalities. 2) Hands-on calibration and maintenance training. 3) Safety protocols and incident response procedures. 4) Review of performance monitoring metrics. Outcomes: All participants successfully completed the training, with an average score of 90% on the final assessment. Feedback indicated that 85% of attendees felt confident in their ability to operate the drones post-training. Follow-up sessions will be scheduled for Q1 2024 to provide refreshers and cover updates from the latest performance evaluation. Certification records will be stored in the Training Database (TD-2023) for reference and future training needs.",
"Calibration Records (ID: CR-2023-56) dated May 1, 2023, for the Tactical Communication System E indicate that the system's signal integrity performance aligns with operational specifications. The RF Transceiver Module was calibrated to operate within the frequency range of 30 MHz to 512 MHz, achieving a signal-to-noise ratio of 30 dB across the tested range. This ensures clarity and reliability during tactical communications. The calibration was conducted by certified technician Bob Johnson, usi"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for retrieval.
SentenceTransformer(
(0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'})
(1): Pooling({'embedding_dimension': 768, 'pooling_mode': 'cls', 'include_prompt': True})
(2): Normalize({})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("nancy-noubou/bge-base-iso-clauses-v1")
# Run inference
sentences = [
'Represent this sentence for searching relevant passages: The organization shall maintain documented information to the extent necessary to have confidence that the processes are being carried out as planned and to demonstrate the conformity of products and services to requirements. The organization shall determine: a) what documented information is necessary for the effectiveness of the quality management system; b) the documented information to be retained to provide evidence of conformity; and c) the period for which it shall be retained.',
'Management Review Notes: Date: 2023-12-01. Participants: Marc Petit (Ops Manager), Sarah Mendez (QA Lead), David Huang (Engineering Supervisor). Agenda Items: 1) Review of Q3 performance metrics related to the Lumen Hull Sensor Edge. 2) Discussions on improving incident response times. 3) Updates on supplier performance and feedback. Decisions Made: - Metrics showed room for improvement in sensor accuracy under varying conditions; action required from the engineering team to address findings before the next review. - A timeline to implement the proposed incident response targets was established, with updates due by the next quarterly meeting on 2024-01-15. - Agreed to continue monitoring Supplier A’s performance and reassess in the Q1 review. Next Meeting: Scheduled for 2024-01-15 to discuss progress and metrics.',
'The calibration records (Document ID: CR-2023-045) for the Surgical Robot R indicate that the robotic arm calibration was last performed on August 15, 2022, making it 14 months overdue for recalibration. The last recorded precision test showed an average positioning error of 1.5 mm, which is above the acceptable threshold of 0.5 mm for surgical applications. These records were compiled by Mark Johnson from the Calibration Department. The outdated calibration status of the robotic arm poses a ris',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.4010, 0.1300],
# [0.4010, 1.0000, 0.1790],
# [0.1300, 0.1790, 1.0000]])
sentence_0, sentence_1, and sentence_2| sentence_0 | sentence_1 | sentence_2 | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| sentence_0 | sentence_1 | sentence_2 |
|---|---|---|
Represent this sentence for searching relevant passages: The manufacturer shall determine measures that are appropriate for reducing the risks to an acceptable level. The manufacturer shall use one or more of the following options in the priority order listed: a) inherently safe design and manufacture; |
In the Design Review Document (ID: DRM-2023-04), dated March 15, 2023, the engineering team conducted a comprehensive risk analysis for the Cold Chain Monitor N. During the review, it was determined that utilizing a temperature-resistant housing made from ABS plastic (with a thermal resistance rating of -40°C to 70°C) significantly reduces the risk of equipment failure in extreme conditions. The housing design was verified through thermal cycling tests, showing a consistent performance with an i |
In the HR training schedule released on August 20, 2023 (Doc ID: HR-TRAIN-CCM-2023-08), the focus is on the onboarding process for new employees involved in the Cold Chain Monitor N project. The document outlines a comprehensive two-week orientation that covers company policies, team structure, and basic operational procedures. Notably, it includes a session on the importance of maintaining strict temperature controls during product handling, which is vital for the device's effectiveness. Althou |
Represent this sentence for searching relevant passages: The organization shall determine and manage the knowledge necessary for the operation of its processes and to achieve conformity of products and services. This knowledge shall be maintained and made available to the extent necessary. |
Date: 2024-02-20. Attendees: Senior Management Team including Marc Petit (Ops Manager), Sarah Mendez (Quality Manager), and Emma Li (Regulatory Affairs). Agenda Items: 1) Review quarterly performance metrics. 2) Discuss customer feedback findings. 3) Evaluate training program effectiveness. Notes: 1) Performance metrics show a 20% increase in production efficiency; actions taken in the prior quarter are yielding results. 2) Customer feedback indicated areas for improvement particularly in user instructions; an initiative to revise manuals was approved. 3) Training effectiveness was acknowledged, and it was decided to implement bi-annual refresher sessions. Decisions Made: 1) Marcy to lead the manual revision initiative with expected completion by April 30, 2024. 2) Sarah to outline a plan for the bi-annual refresher sessions, targeting early May 2024 for the first session. |
Title: Risk Assessment for Quantum Diagnostic Imager Elite. Date: 2024-03-01. Conducted by: Marc Petit, Operations Manager, and Sarah Mendez, Quality Manager. Identified Risks: 1) Risk of imaging inaccuracy due to equipment malfunction. 2) Supplier dependency affecting materials quality. Mitigation Actions: For the first risk, a comprehensive calibration schedule has been established, with reminders set in the system to ensure timely execution. Additionally, the training program for technicians has been enhanced to include troubleshooting for common malfunctions. For the second risk, diversifying suppliers has been outlined as a strategy, with an evaluation of potential candidates already underway. Monitoring plans include quarterly reviews of equipment performance and supplier assessments, documented in the Risk Management Log (RML-2024). |
Represent this sentence for searching relevant passages: The results of this review shall be recorded in the management file. Compliance is checked by inspection of the evaluation of overall residual risk. The manufacturer shall evaluate the overall residual risk posed by the medical device, taking into account the contributions of all risk control measures that have been implemented and verified, in relation to the criteria for acceptability of the overall residual risk defined in the risk management plan. If the overall residual risk is judged acceptable, the manufacturer shall inform users of significant residual risks and shall include the necessary information in the documentation in order to disclose those residual risks. |
Calibration records (ID: CCMO-CAL-2023-012) for the Cold Chain Monitor O were last updated on May 10, 2023. The temperature sensors were calibrated using NIST-traceable standards, with a measured accuracy of ±0.2°C. This process was conducted by the Calibration Specialist, Emily White, and included the verification of 10 sensors across different units. Each calibration was documented with specific reference to the calibration equipment used, which is regularly maintained and validated against pr |
An internal draft titled 'Cold Chain Monitor O Risk Management Procedure' (Document ID: DRAFT-2023-009) was circulated on April 5, 2023. This document outlines a proposed framework for identifying risks associated with the product's performance in differing environmental conditions. While the draft emphasizes the significance of risk evaluation, it lacks specific metrics or a step-by-step process for implementation. The final approval of this procedure is pending, with no set date for completion |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false,
"directions": [
"query_to_doc"
],
"partition_mode": "joint",
"hardness_mode": null,
"hardness_strength": 0.0
}
per_device_train_batch_size: 16per_device_eval_batch_size: 16num_train_epochs: 1multi_dataset_batch_sampler: round_robindo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16gradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_ratio: Nonewarmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Trueenable_jit_checkpoint: Falsesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseuse_cpu: Falseseed: 42data_seed: Nonebf16: Falsefp16: Falsebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: -1ddp_backend: Nonedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonedisable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Nonegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Truepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_for_metrics: []eval_do_concat_batches: Trueauto_find_batch_size: Falsefull_determinism: Falseddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueuse_cache: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss |
|---|---|---|
| 0.3401 | 500 | 2.6430 |
| 0.6803 | 1000 | 2.2176 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{oord2019representationlearningcontrastivepredictive,
title={Representation Learning with Contrastive Predictive Coding},
author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
year={2019},
eprint={1807.03748},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/1807.03748},
}
Base model
BAAI/bge-base-en-v1.5