---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:16186
- loss:MultipleNegativesRankingLoss
base_model: nvidia/NV-Embed-v2
widget:
- source_sentence: 'Instruct: Given a question, retrieve passages that answer the
question. Query: what is the numeric dose of the Pembrolizumab Regimen?'
sentences:
- "Source: Radiology. Date: 2019-11-06. Context: 11/06/2019 1:03:20 PM -0500496d70726f7665204865616c7468\
\ PAGE 2 OF 3\n ________ ________ ________\n___ _____ ___ _____ _____, __\
\ _____-____\nIMAGING SERVICES\nPatient Name: Exam Date/Time: Phone _: \
\ MRN:\nYoung, _______ _______ 11/06/2019 11:50 AM ___-___-____ ______\n\
DOB: Se Account _:\n11/3/1939 Female _________\nPt Class: Accession\
\ _: Performing Department:\nOutpatient _________ MRI - FMH\nPrimary\
\ Care Provider: Ordering Provider: Authorizing Provider:\n______, ____\
\ _ ______, _______ _ ______, _______ _\nLaterality:\n9 Final - MRI BRAIN\
\ W/WO CONT"
- 'Source: SOAP_Note. Date: 2022-01-30. Context: _12 TAB
Prov: 01/19/22
D: 01/23/22 1545 Patient stopped taking
Reported Medications
ONDANSETRON (ZOFRAN) 4 MG PO Q6H
Metoprolol Succinate (TOPROL XL) 50 MG PO DAILY
predniSONE 5 MG PO DAILY
TRAMETINIB DIMETHYL SULFOXIDE (MEKINIST) 2 MG PO DAILY
DABRAFENIB MESYLATE (TAFINLAR) 100 MG PO BID
LOSARTAN (COZAAR) 50 MG PO DAILY
MIRTAZAPINE (REMERON) 7.5 MG PO BEDTIME
MED LIST INFORMATION 1 EA - CANCEL AT DISCHARGE
Additional Medical History
PMH:
Stage 4 Melanoma Cancer
Additional Surgical History
'
- "Source: SOAP_Note. Date: 2024-02-17. Context: 60 mg-90 mg-500 mg) qd \n* Metoprolol\
\ Oral 24 hr Tab (Succinate) 25 mg tablet extended release 24 hr \n Regimens:\n\
\ Pembrolizumab Q21D (Flat Dose) (Adjuvant Melanoma, RCC)\n Hydration IV and Electrolyte\
\ Replacement Supportive Care\n \n \n \n Allergies\n "
- source_sentence: 'Instruct: Given a question, retrieve passages that answer the
question. Query: how many Radiation Therapy fractions were administered?'
sentences:
- "Source: SOAP_Note. Date: 2024-10-03. Context: PET with large volume metastatic\
\ disease involving the bones, soft tissue, and lung parenchyma bilaterally.\n\
\ - Radiation therapy left shoulder, right SI joint, right femur completed 1/5/22.\n\
\ - Nivolumab and ipilimumab initiated 11/24/21. "
- 'Source: SOAP_Note. Date: 2019-08-21. Context: 4 weeks, Print on Rx., Instructions/Comments:
nivolumab. [Updated. _______ _. _____ 08/21/2019 13:56].
Cancer Regimens Nivolumab Q28D (Flat Dose, Adjuvant Melanoma): C2D1. [_______
_. _____ 08/21/2019 15:18].I.V. access: peripheral IV, Site: '
- "Source: SOAP_Note. Date: 2023-11-27. Context: per day, down from 1.5 ppd. He\
\ has been smoking for the past 40 years.\n He denies alcohol use.\n He worked\
\ for ____ ______ / _____ _____ _____ \n \n FAMILY HISTORY:\n Mother,\
\ age 94, Merkle cell carcinoma in her 70s. Daughter, age 52, brain tumor.\n Father,\
\ deceased at age 66, heart disease.\n \n REVIEW OF SYSTEMS: A comprehensive\
\ (10+) review of systems was performed today and was negative unless noted above.\n\
\ \n VITALS: Blood pressure: 128/79, Sitting, Regular, Pulse: 110, "
- source_sentence: 'Instruct: Given a question, retrieve passages that answer the
question. Query: when did the Dabrafenib Regimen start?'
sentences:
- 'Source: SOAP_Note. Date: 2018-11-29. Context: Take 1 PO daily, Instructions:
Take at least 1 hour before or two hours after a meal. [______ ______ 12/26/2018
13:46].Dabrafenib mesylate, po solid: 75 mg Capsule Take 2 PO BID, Instructions:
Take whole, at least 1 hour before or two hours after a '
- "Source: Pathology. Date: 2021-06-22. Context: Referral: SECONDARY AND UNSPECIFIED\
\ MALIGNANT NEOPLASM OF LYMPH\nNODE, UNSPECIFIED\nFX4\nResults HEENT: \n\
HEE BRAF V600E\nNot Expressed\n1\n\n M\n19 \n1.10 78\nH\n\n1\n* A \
\ \nA\nI \nIntended Use:\nStains were scored by a pathologist using "
- "Source: SOAP_Note. Date: 2024-09-16. Context: \
\ Mr. _____ is married and he lives with his wife in _____ _____, __.\n The\
\ patient has cut back to 5 cigarettes per day, down from 1.5 ppd. He has been\
\ smoking for the past 40 years.\n He denies alcohol use.\n He worked for Duke\
\ Energy / "
- source_sentence: 'Instruct: Given a question, retrieve passages that answer the
question. Query: when was the Reexcision performed?'
sentences:
- "Source: SOAP_Note. Date: 2024-06-13. Context: scan showed cutaneous involvement\
\ in the skin and also right inguinal adenopathy. No evidence of distant metastases.\
\ Opdualag _1.\n \n 10/03/2023: The patient complains of vertigo and wants to\
\ delay her next treatment. We will add Dramamine.\n \n "
- "Source: Pathology. Date: 2022-03-23. Context: MD ______, _______\n________\
\ ____ _________ - _______ ____ DOB: 09/14/1959\n______ ____ __ ____ Rd\
\ Age: 62\n__ _____ ___ Sex: Male\n___ _____, __ _____\n___-___-____\n\
\ 8 Accession _: ____-_____\nCollection Date: 03/23/2022\nollection Date:\
\ 03/23/ MRN: _____\nReceived Date: 03/23/2022\nReported Date: 03/24/2022\n\
SKIN, MID FRONTAL SCALP, EXCISION -\nNO EVIDENCE OF MALIGNANCY, FINAL MARGINS\
\ FREE OF TUMOR.\nSEE COMMENT.\nComment: Portions of deep subcutaneous fat and\
\ fascia are seen, all free of malignancy.\n\n_______ _. ______, MD\n**Electronically\
\ Signed on 24 MAR 2022 12:03PM** 8\nCLINICAL DATA:\nMID FRONTAL SCALP - EXCISION"
- "Source: Genetic_Testing. Date: 2023-08-21. Context: and a STERETCHING\nvariants\
\ including genes associated wi 08 in 7/31 \n18 comination repair deficiency\
\ * fusion NTR2 on \n11 (HR/HRD, microsatellite instability (MS gain\
\ Eston\nare umr mutational surgen 3. Kat "
- source_sentence: 'Instruct: Given a question, retrieve passages that answer the
question. Query: what is the total dose administered in the EBRT Intensity Modulated
Radiation Therapy?'
sentences:
- "Source: SOAP_Note. Date: 2022-10-10. Context: given. \n \n Interim History\n\
\ \n _____ was last seen on 09/16/2022, at which time he started adjuvant immunotherapy\
\ with Keytruda q21 days. Here today for follow up and labs prior to C2 of treatment.\
\ States he is overall feeling well. Tolerated the "
- "Source: SOAP_Note. Date: 2020-03-13. Context: MV electrons.\n \n FIELDS:\n The\
\ right orbital mass and right cervical lymph nodes were initially treated with\
\ a two arc IMRT plan. Arc 1: 11.4 x 21 cm. Gantry start and stop angles 178 degrees\
\ / 182 degrees. Arc 2: 16.4 x 13.0 cm. Gantry start "
- "Source: Radiology. Date: 2023-09-18. Context: : >60\n \n Contrast Type: OMNI\
\ 350\n Volume: 80ML\n \n Lot_: ________\n \n Exp. date: 05/26 \n Study Completed:\
\ CT CHEST W\n \n Reading Group:BCH \n \n Prior Studies for Comparison: 06/14/23\
\ CT CHEST W RMCC \n \n ________ ______\n "
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: SentenceTransformer based on nvidia/NV-Embed-v2
results:
- task:
type: patient-qa
name: Patient QA
dataset:
name: ontada test
type: ontada-test
metrics:
- type: cosine_accuracy@1
value: 0.6856459330143541
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.9531100478468899
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.990909090909091
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1.0
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.6856459330143541
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.5208931419457735
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.39693779904306226
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.22511961722488041
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.4202789169894433
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.8154078377762588
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.9453700539226855
name: Cosine Recall@5
- type: cosine_recall@10
value: 1.0046297562087037
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.8649347118737546
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.8190546441862219
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.804978870109979
name: Cosine Map@100
---
# SentenceTransformer based on nvidia/NV-Embed-v2
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nvidia/NV-Embed-v2](https://huggingface.co/nvidia/NV-Embed-v2). It maps sentences & paragraphs to a 4096-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [nvidia/NV-Embed-v2](https://huggingface.co/nvidia/NV-Embed-v2)
- **Maximum Sequence Length:** 1024 tokens
- **Output Dimensionality:** 4096 dimensions
- **Similarity Function:** Cosine Similarity
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: NVEmbedModel
(1): Pooling({'word_embedding_dimension': 4096, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': False})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("MendelAI/nv-embed-v2-ontada-twab-peft")
# Run inference
sentences = [
'Instruct: Given a question, retrieve passages that answer the question. Query: what is the total dose administered in the EBRT Intensity Modulated Radiation Therapy?',
'Source: SOAP_Note. Date: 2020-03-13. Context: MV electrons.\n \n FIELDS:\n The right orbital mass and right cervical lymph nodes were initially treated with a two arc IMRT plan. Arc 1: 11.4 x 21 cm. Gantry start and stop angles 178 degrees / 182 degrees. Arc 2: 16.4 x 13.0 cm. Gantry start ',
'Source: Radiology. Date: 2023-09-18. Context: : >60\n \n Contrast Type: OMNI 350\n Volume: 80ML\n \n Lot_: ________\n \n Exp. date: 05/26 \n Study Completed: CT CHEST W\n \n Reading Group:BCH \n \n Prior Studies for Comparison: 06/14/23 CT CHEST W RMCC \n \n ________ ______\n ',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 4096]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Evaluation
### Metrics
#### Patient QA
* Dataset: `ontada-test`
* Evaluated with [PatientQAEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.PatientQAEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| cosine_accuracy@1 | 0.6856 |
| cosine_accuracy@3 | 0.9531 |
| cosine_accuracy@5 | 0.9909 |
| cosine_accuracy@10 | 1.0 |
| cosine_precision@1 | 0.6856 |
| cosine_precision@3 | 0.5209 |
| cosine_precision@5 | 0.3969 |
| cosine_precision@10 | 0.2251 |
| cosine_recall@1 | 0.4203 |
| cosine_recall@3 | 0.8154 |
| cosine_recall@5 | 0.9454 |
| cosine_recall@10 | 1.0046 |
| **cosine_ndcg@10** | **0.8649** |
| cosine_mrr@10 | 0.8191 |
| cosine_map@100 | 0.805 |
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 16,186 training samples
* Columns: question
and context
* Approximate statistics based on the first 1000 samples:
| | question | context |
|:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
| type | string | string |
| details |
Instruct: Given a question, retrieve passages that answer the question. Query: what was the abnormality identified for BRAF?
| Source: Genetic_Testing. Date: 2022-10-07. Context: Mutational Seq DNA-Tumor Low, 6 mt/Mb NF1
Seq DNA-Tumor Mutation Not Detected
T In Not D
ARID2 Seq DNA-Tumor Mutation Not Detected CNA-Seq DNA-Tumor Deletion Not Detected
PTEN
Seq RNA-Tumor Fusion Not Detected Seq DNA-Tumor Mutation Not Detected
BRAF
Amplification Not _
CNA-Seq DNA-Tumor Detected RAC1 Seq DNA-Tumor Mutation Not Detected
The selection of any, all, or none of the matched therapies
|
| Instruct: Given a question, retrieve passages that answer the question. Query: what was the abnormality identified for BRAF?
| Source: Genetic_Testing. Date: 2021-06-04. Context: characteristics have been determined by _____ ___________
_______ _________ ___ ____ __________. It has not been
cleared or approved by FDA. This assay has been validated
pursuant to the CLIA regulations and is used for clinical
purposes.
BRAF MUTATION ANALYSIS E
SOURCE: LYMPH NODE
PARAFFIN BLOCK NUMBER: ____-_______ A4
BRAF MUTATION ANALYSIS NOT DETECTED NOT DETECTED
This result was reviewed and interpreted by _. ____, M.D.
Based on Sanger sequencing analysis, no mutations
|
| Instruct: Given a question, retrieve passages that answer the question. Query: what was the abnormality identified for BRAF?
| Source: Pathology. Date: 2019-12-12. Context: Receive Date: 12/12/2019
___ _: ________________ Accession Date: 12/12/2019
Copy To: Report Date: 12/19/2019 18:16
***SUPPLEMENTAL REPORT***
(previous report date: 12/19/2019)
BRAF SNAPSHOT
Results:
POSITIVE
Interpretation:
A BRAF mutation was detected in the provided specimen.
FDA has approved TKI inhibitor vemurafenib and dabrafenib for the first-line treatment of patients with
unresectable or metastatic melanoma whose tumors have a BRAF V600E mutation, and trametinib for tumors
|
* Loss: [MultipleNegativesRankingLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
```json
{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 4
- `per_device_eval_batch_size`: 64
- `learning_rate`: 2e-05
- `num_train_epochs`: 1
- `warmup_ratio`: 0.1
- `seed`: 6789
- `bf16`: True
- `prompts`: {'question': 'Instruct: Given a question, retrieve passages that answer the question. Query: '}
- `batch_sampler`: no_duplicates
#### All Hyperparameters