SentenceTransformer based on nvidia/NV-Embed-v2

This is a sentence-transformers model finetuned from nvidia/NV-Embed-v2. It maps sentences & paragraphs to a 4096-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: nvidia/NV-Embed-v2
  • Maximum Sequence Length: 1024 tokens
  • Output Dimensionality: 4096 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: NVEmbedModel 
  (1): Pooling({'word_embedding_dimension': 4096, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': False})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("MendelAI/nv-embed-v2-ontada-twab-peft")
# Run inference
sentences = [
    'Instruct: Given a question, retrieve passages that answer the question. Query: what is the total dose administered in the EBRT Intensity Modulated Radiation Therapy?',
    'Source: SOAP_Note. Date: 2020-03-13. Context: MV electrons.\n \n FIELDS:\n The right orbital mass and right cervical lymph nodes were initially treated with a two arc IMRT plan. Arc 1: 11.4 x 21 cm. Gantry start and stop angles 178 degrees / 182 degrees. Arc 2: 16.4 x 13.0 cm. Gantry start ',
    'Source: Radiology. Date: 2023-09-18. Context: : >60\n \n Contrast Type: OMNI 350\n   Volume: 80ML\n \n Lot_: ________\n \n Exp. date: 05/26 \n Study Completed: CT CHEST W\n \n Reading Group:BCH \n  \n   Prior Studies for Comparison: 06/14/23 CT CHEST W RMCC  \n \n ________ ______\n  ',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 4096]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Patient QA

Metric Value
cosine_accuracy@1 0.6856
cosine_accuracy@3 0.9531
cosine_accuracy@5 0.9909
cosine_accuracy@10 1.0
cosine_precision@1 0.6856
cosine_precision@3 0.5209
cosine_precision@5 0.3969
cosine_precision@10 0.2251
cosine_recall@1 0.4203
cosine_recall@3 0.8154
cosine_recall@5 0.9454
cosine_recall@10 1.0046
cosine_ndcg@10 0.8649
cosine_mrr@10 0.8191
cosine_map@100 0.805

Training Details

Training Dataset

Unnamed Dataset

  • Size: 16,186 training samples
  • Columns: question and context
  • Approximate statistics based on the first 1000 samples:
    question context
    type string string
    details
    • min: 25 tokens
    • mean: 30.78 tokens
    • max: 39 tokens
    • min: 74 tokens
    • mean: 177.84 tokens
    • max: 398 tokens
  • Samples:
    question context
    Instruct: Given a question, retrieve passages that answer the question. Query: what was the abnormality identified for BRAF? Source: Genetic_Testing. Date: 2022-10-07. Context: Mutational Seq DNA-Tumor Low, 6 mt/Mb NF1
    Seq DNA-Tumor Mutation Not Detected
    T In Not D
    ARID2 Seq DNA-Tumor Mutation Not Detected CNA-Seq DNA-Tumor Deletion Not Detected
    PTEN
    Seq RNA-Tumor Fusion Not Detected Seq DNA-Tumor Mutation Not Detected
    BRAF
    Amplification Not _
    CNA-Seq DNA-Tumor Detected RAC1 Seq DNA-Tumor Mutation Not Detected
    The selection of any, all, or none of the matched therapies
    Instruct: Given a question, retrieve passages that answer the question. Query: what was the abnormality identified for BRAF? Source: Genetic_Testing. Date: 2021-06-04. Context: characteristics have been determined by _____ ____
    _________ ___ ____ _______. It has not been
    cleared or approved by FDA. This assay has been validated
    pursuant to the CLIA regulations and is used for clinical
    purposes.
    BRAF MUTATION ANALYSIS E
    SOURCE: LYMPH NODE
    PARAFFIN BLOCK NUMBER: -
    A4
    BRAF MUTATION ANALYSIS NOT DETECTED NOT DETECTED
    This result was reviewed and interpreted by _. ____, M.D.
    Based on Sanger sequencing analysis, no mutations
    Instruct: Given a question, retrieve passages that answer the question. Query: what was the abnormality identified for BRAF? Source: Pathology. Date: 2019-12-12. Context: Receive Date: 12/12/2019
    ___ _: ________________ Accession Date: 12/12/2019
    Copy To: Report Date: 12/19/2019 18:16
    SUPPLEMENTAL REPORT
    (previous report date: 12/19/2019)
    BRAF SNAPSHOT
    Results:
    POSITIVE
    Interpretation:
    A BRAF mutation was detected in the provided specimen.
    FDA has approved TKI inhibitor vemurafenib and dabrafenib for the first-line treatment of patients with
    unresectable or metastatic melanoma whose tumors have a BRAF V600E mutation, and trametinib for tumors
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 64
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • seed: 6789
  • bf16: True
  • prompts: {'question': 'Instruct: Given a question, retrieve passages that answer the question. Query: '}
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 6789
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • prompts: {'question': 'Instruct: Given a question, retrieve passages that answer the question. Query: '}
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss ontada-test_cosine_ndcg@10
0 0 - 0.8431
0.0002 1 1.5826 -
0.0371 150 0.4123 -
0.0741 300 0.3077 -
0.1112 450 0.2184 -
0.1483 600 0.3291 -
0.1853 750 0.2343 -
0.2224 900 0.2506 -
0.2471 1000 - 0.8077
0.2595 1050 0.1294 -
0.2965 1200 0.0158 -
0.3336 1350 0.0189 -
0.3706 1500 0.0363 -
0.4077 1650 0.0208 -
0.4448 1800 0.475 -
0.4818 1950 0.6183 -
0.4942 2000 - 0.8482
0.5189 2100 0.4779 -
0.5560 2250 0.4194 -
0.5930 2400 0.8376 -
0.6301 2550 0.4249 -
0.6672 2700 0.9336 -
0.7042 2850 0.5351 -
0.7413 3000 1.0253 0.8551
0.7784 3150 0.3961 -
0.8154 3300 0.3881 -
0.8525 3450 0.5573 -
0.8895 3600 1.222 -
0.9266 3750 0.3032 -
0.9637 3900 0.3142 -
0.9884 4000 - 0.8645
1.0 4047 - 0.8649

Framework Versions

  • Python: 3.11.10
  • Sentence Transformers: 3.4.0.dev0
  • Transformers: 4.46.0
  • PyTorch: 2.3.1+cu121
  • Accelerate: 1.0.1
  • Datasets: 3.0.1
  • Tokenizers: 0.20.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
102
Safetensors
Model size
7.85B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for MendelAI/nv-embed-v2-ontada-twab-peft

Base model

nvidia/NV-Embed-v2
Finetuned
(1)
this model

Collection including MendelAI/nv-embed-v2-ontada-twab-peft

Evaluation results