simran-t commited on
Commit
68d34af
1 Parent(s): 64900ad

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 4096,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": false
10
+ }
README.md ADDED
@@ -0,0 +1,586 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:16186
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: nvidia/NV-Embed-v2
10
+ widget:
11
+ - source_sentence: 'Instruct: Given a question, retrieve passages that answer the
12
+ question. Query: what is the numeric dose of the Pembrolizumab Regimen?'
13
+ sentences:
14
+ - "Source: Radiology. Date: 2019-11-06. Context: 11/06/2019 1:03:20 PM -0500496d70726f7665204865616c7468\
15
+ \ PAGE 2 OF 3\n ________ ________ ________\n___ _____ ___ _____ _____, __\
16
+ \ _____-____\nIMAGING SERVICES\nPatient Name: Exam Date/Time: Phone _: \
17
+ \ MRN:\nYoung, _______ _______ 11/06/2019 11:50 AM ___-___-____ ______\n\
18
+ DOB: Se Account _:\n11/3/1939 Female _________\nPt Class: Accession\
19
+ \ _: Performing Department:\nOutpatient _________ MRI - FMH\nPrimary\
20
+ \ Care Provider: Ordering Provider: Authorizing Provider:\n______, ____\
21
+ \ _ ______, _______ _ ______, _______ _\nLaterality:\n9 Final - MRI BRAIN\
22
+ \ W/WO CONT"
23
+ - 'Source: SOAP_Note. Date: 2022-01-30. Context: _12 TAB
24
+
25
+ Prov: 01/19/22
26
+
27
+ D: 01/23/22 1545 Patient stopped taking
28
+
29
+ Reported Medications
30
+
31
+ ONDANSETRON (ZOFRAN) 4 MG PO Q6H
32
+
33
+ Metoprolol Succinate (TOPROL XL) 50 MG PO DAILY
34
+
35
+ predniSONE 5 MG PO DAILY
36
+
37
+ TRAMETINIB DIMETHYL SULFOXIDE (MEKINIST) 2 MG PO DAILY
38
+
39
+ DABRAFENIB MESYLATE (TAFINLAR) 100 MG PO BID
40
+
41
+ LOSARTAN (COZAAR) 50 MG PO DAILY
42
+
43
+ MIRTAZAPINE (REMERON) 7.5 MG PO BEDTIME
44
+
45
+ MED LIST INFORMATION 1 EA - CANCEL AT DISCHARGE
46
+
47
+ Additional Medical History
48
+
49
+ PMH:
50
+
51
+ Stage 4 Melanoma Cancer
52
+
53
+ Additional Surgical History
54
+
55
+ '
56
+ - "Source: SOAP_Note. Date: 2024-02-17. Context: 60 mg-90 mg-500 mg) qd \n* Metoprolol\
57
+ \ Oral 24 hr Tab (Succinate) 25 mg tablet extended release 24 hr \n Regimens:\n\
58
+ \ Pembrolizumab Q21D (Flat Dose) (Adjuvant Melanoma, RCC)\n Hydration IV and Electrolyte\
59
+ \ Replacement Supportive Care\n \n \n \n Allergies\n "
60
+ - source_sentence: 'Instruct: Given a question, retrieve passages that answer the
61
+ question. Query: how many Radiation Therapy fractions were administered?'
62
+ sentences:
63
+ - "Source: SOAP_Note. Date: 2024-10-03. Context: PET with large volume metastatic\
64
+ \ disease involving the bones, soft tissue, and lung parenchyma bilaterally.\n\
65
+ \ - Radiation therapy left shoulder, right SI joint, right femur completed 1/5/22.\n\
66
+ \ - Nivolumab and ipilimumab initiated 11/24/21. "
67
+ - 'Source: SOAP_Note. Date: 2019-08-21. Context: 4 weeks, Print on Rx., Instructions/Comments:
68
+ nivolumab. [Updated. _______ _. _____ 08/21/2019 13:56].
69
+
70
+ Cancer Regimens Nivolumab Q28D (Flat Dose, Adjuvant Melanoma): C2D1. [_______
71
+ _. _____ 08/21/2019 15:18].I.V. access: peripheral IV, Site: '
72
+ - "Source: SOAP_Note. Date: 2023-11-27. Context: per day, down from 1.5 ppd. He\
73
+ \ has been smoking for the past 40 years.\n He denies alcohol use.\n He worked\
74
+ \ for ____ ______ / _____ _____ _____ \n \n FAMILY HISTORY:\n Mother,\
75
+ \ age 94, Merkle cell carcinoma in her 70s. Daughter, age 52, brain tumor.\n Father,\
76
+ \ deceased at age 66, heart disease.\n \n REVIEW OF SYSTEMS: A comprehensive\
77
+ \ (10+) review of systems was performed today and was negative unless noted above.\n\
78
+ \ \n VITALS: Blood pressure: 128/79, Sitting, Regular, Pulse: 110, "
79
+ - source_sentence: 'Instruct: Given a question, retrieve passages that answer the
80
+ question. Query: when did the Dabrafenib Regimen start?'
81
+ sentences:
82
+ - 'Source: SOAP_Note. Date: 2018-11-29. Context: Take 1 PO daily, Instructions:
83
+ Take at least 1 hour before or two hours after a meal. [______ ______ 12/26/2018
84
+ 13:46].Dabrafenib mesylate, po solid: 75 mg Capsule Take 2 PO BID, Instructions:
85
+ Take whole, at least 1 hour before or two hours after a '
86
+ - "Source: Pathology. Date: 2021-06-22. Context: Referral: SECONDARY AND UNSPECIFIED\
87
+ \ MALIGNANT NEOPLASM OF LYMPH\nNODE, UNSPECIFIED\nFX4\nResults HEENT: \n\
88
+ HEE BRAF V600E\nNot Expressed\n1\n\n M\n19 \n1.10 78\nH\n\n1\n* A \
89
+ \ \nA\nI \nIntended Use:\nStains were scored by a pathologist using "
90
+ - "Source: SOAP_Note. Date: 2024-09-16. Context: \
91
+ \ Mr. _____ is married and he lives with his wife in _____ _____, __.\n The\
92
+ \ patient has cut back to 5 cigarettes per day, down from 1.5 ppd. He has been\
93
+ \ smoking for the past 40 years.\n He denies alcohol use.\n He worked for Duke\
94
+ \ Energy / "
95
+ - source_sentence: 'Instruct: Given a question, retrieve passages that answer the
96
+ question. Query: when was the Reexcision performed?'
97
+ sentences:
98
+ - "Source: SOAP_Note. Date: 2024-06-13. Context: scan showed cutaneous involvement\
99
+ \ in the skin and also right inguinal adenopathy. No evidence of distant metastases.\
100
+ \ Opdualag _1.\n \n 10/03/2023: The patient complains of vertigo and wants to\
101
+ \ delay her next treatment. We will add Dramamine.\n \n "
102
+ - "Source: Pathology. Date: 2022-03-23. Context: MD ______, _______\n________\
103
+ \ ____ _________ - _______ ____ DOB: 09/14/1959\n______ ____ __ ____ Rd\
104
+ \ Age: 62\n__ _____ ___ Sex: Male\n___ _____, __ _____\n___-___-____\n\
105
+ \ 8 Accession _: ____-_____\nCollection Date: 03/23/2022\nollection Date:\
106
+ \ 03/23/ MRN: _____\nReceived Date: 03/23/2022\nReported Date: 03/24/2022\n\
107
+ SKIN, MID FRONTAL SCALP, EXCISION -\nNO EVIDENCE OF MALIGNANCY, FINAL MARGINS\
108
+ \ FREE OF TUMOR.\nSEE COMMENT.\nComment: Portions of deep subcutaneous fat and\
109
+ \ fascia are seen, all free of malignancy.\n\n_______ _. ______, MD\n**Electronically\
110
+ \ Signed on 24 MAR 2022 12:03PM** 8\nCLINICAL DATA:\nMID FRONTAL SCALP - EXCISION"
111
+ - "Source: Genetic_Testing. Date: 2023-08-21. Context: and a STERETCHING\nvariants\
112
+ \ including genes associated wi 08 in 7/31 \n18 comination repair deficiency\
113
+ \ * fusion NTR2 on \n11 (HR/HRD, microsatellite instability (MS gain\
114
+ \ Eston\nare umr mutational surgen 3. Kat "
115
+ - source_sentence: 'Instruct: Given a question, retrieve passages that answer the
116
+ question. Query: what is the total dose administered in the EBRT Intensity Modulated
117
+ Radiation Therapy?'
118
+ sentences:
119
+ - "Source: SOAP_Note. Date: 2022-10-10. Context: given. \n \n Interim History\n\
120
+ \ \n _____ was last seen on 09/16/2022, at which time he started adjuvant immunotherapy\
121
+ \ with Keytruda q21 days. Here today for follow up and labs prior to C2 of treatment.\
122
+ \ States he is overall feeling well. Tolerated the "
123
+ - "Source: SOAP_Note. Date: 2020-03-13. Context: MV electrons.\n \n FIELDS:\n The\
124
+ \ right orbital mass and right cervical lymph nodes were initially treated with\
125
+ \ a two arc IMRT plan. Arc 1: 11.4 x 21 cm. Gantry start and stop angles 178 degrees\
126
+ \ / 182 degrees. Arc 2: 16.4 x 13.0 cm. Gantry start "
127
+ - "Source: Radiology. Date: 2023-09-18. Context: : >60\n \n Contrast Type: OMNI\
128
+ \ 350\n Volume: 80ML\n \n Lot_: ________\n \n Exp. date: 05/26 \n Study Completed:\
129
+ \ CT CHEST W\n \n Reading Group:BCH \n \n Prior Studies for Comparison: 06/14/23\
130
+ \ CT CHEST W RMCC \n \n ________ ______\n "
131
+ pipeline_tag: sentence-similarity
132
+ library_name: sentence-transformers
133
+ metrics:
134
+ - cosine_accuracy@1
135
+ - cosine_accuracy@3
136
+ - cosine_accuracy@5
137
+ - cosine_accuracy@10
138
+ - cosine_precision@1
139
+ - cosine_precision@3
140
+ - cosine_precision@5
141
+ - cosine_precision@10
142
+ - cosine_recall@1
143
+ - cosine_recall@3
144
+ - cosine_recall@5
145
+ - cosine_recall@10
146
+ - cosine_ndcg@10
147
+ - cosine_mrr@10
148
+ - cosine_map@100
149
+ model-index:
150
+ - name: SentenceTransformer based on nvidia/NV-Embed-v2
151
+ results:
152
+ - task:
153
+ type: patient-qa
154
+ name: Patient QA
155
+ dataset:
156
+ name: ontada test
157
+ type: ontada-test
158
+ metrics:
159
+ - type: cosine_accuracy@1
160
+ value: 0.6856459330143541
161
+ name: Cosine Accuracy@1
162
+ - type: cosine_accuracy@3
163
+ value: 0.9531100478468899
164
+ name: Cosine Accuracy@3
165
+ - type: cosine_accuracy@5
166
+ value: 0.990909090909091
167
+ name: Cosine Accuracy@5
168
+ - type: cosine_accuracy@10
169
+ value: 1.0
170
+ name: Cosine Accuracy@10
171
+ - type: cosine_precision@1
172
+ value: 0.6856459330143541
173
+ name: Cosine Precision@1
174
+ - type: cosine_precision@3
175
+ value: 0.5208931419457735
176
+ name: Cosine Precision@3
177
+ - type: cosine_precision@5
178
+ value: 0.39693779904306226
179
+ name: Cosine Precision@5
180
+ - type: cosine_precision@10
181
+ value: 0.22511961722488041
182
+ name: Cosine Precision@10
183
+ - type: cosine_recall@1
184
+ value: 0.4202789169894433
185
+ name: Cosine Recall@1
186
+ - type: cosine_recall@3
187
+ value: 0.8154078377762588
188
+ name: Cosine Recall@3
189
+ - type: cosine_recall@5
190
+ value: 0.9453700539226855
191
+ name: Cosine Recall@5
192
+ - type: cosine_recall@10
193
+ value: 1.0046297562087037
194
+ name: Cosine Recall@10
195
+ - type: cosine_ndcg@10
196
+ value: 0.8649347118737546
197
+ name: Cosine Ndcg@10
198
+ - type: cosine_mrr@10
199
+ value: 0.8190546441862219
200
+ name: Cosine Mrr@10
201
+ - type: cosine_map@100
202
+ value: 0.804978870109979
203
+ name: Cosine Map@100
204
+ ---
205
+
206
+ # SentenceTransformer based on nvidia/NV-Embed-v2
207
+
208
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nvidia/NV-Embed-v2](https://huggingface.co/nvidia/NV-Embed-v2). It maps sentences & paragraphs to a 4096-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
209
+
210
+ ## Model Details
211
+
212
+ ### Model Description
213
+ - **Model Type:** Sentence Transformer
214
+ - **Base model:** [nvidia/NV-Embed-v2](https://huggingface.co/nvidia/NV-Embed-v2) <!-- at revision 7604d305b621f14095a1aa23d351674c2859553a -->
215
+ - **Maximum Sequence Length:** 1024 tokens
216
+ - **Output Dimensionality:** 4096 dimensions
217
+ - **Similarity Function:** Cosine Similarity
218
+ <!-- - **Training Dataset:** Unknown -->
219
+ <!-- - **Language:** Unknown -->
220
+ <!-- - **License:** Unknown -->
221
+
222
+ ### Model Sources
223
+
224
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
225
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
226
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
227
+
228
+ ### Full Model Architecture
229
+
230
+ ```
231
+ SentenceTransformer(
232
+ (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: NVEmbedModel
233
+ (1): Pooling({'word_embedding_dimension': 4096, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': False})
234
+ (2): Normalize()
235
+ )
236
+ ```
237
+
238
+ ## Usage
239
+
240
+ ### Direct Usage (Sentence Transformers)
241
+
242
+ First install the Sentence Transformers library:
243
+
244
+ ```bash
245
+ pip install -U sentence-transformers
246
+ ```
247
+
248
+ Then you can load this model and run inference.
249
+ ```python
250
+ from sentence_transformers import SentenceTransformer
251
+
252
+ # Download from the 🤗 Hub
253
+ model = SentenceTransformer("MendelAI/nv-embed-v2-ontada-twab-peft")
254
+ # Run inference
255
+ sentences = [
256
+ 'Instruct: Given a question, retrieve passages that answer the question. Query: what is the total dose administered in the EBRT Intensity Modulated Radiation Therapy?',
257
+ 'Source: SOAP_Note. Date: 2020-03-13. Context: MV electrons.\n \n FIELDS:\n The right orbital mass and right cervical lymph nodes were initially treated with a two arc IMRT plan. Arc 1: 11.4 x 21 cm. Gantry start and stop angles 178 degrees / 182 degrees. Arc 2: 16.4 x 13.0 cm. Gantry start ',
258
+ 'Source: Radiology. Date: 2023-09-18. Context: : >60\n \n Contrast Type: OMNI 350\n Volume: 80ML\n \n Lot_: ________\n \n Exp. date: 05/26 \n Study Completed: CT CHEST W\n \n Reading Group:BCH \n \n Prior Studies for Comparison: 06/14/23 CT CHEST W RMCC \n \n ________ ______\n ',
259
+ ]
260
+ embeddings = model.encode(sentences)
261
+ print(embeddings.shape)
262
+ # [3, 4096]
263
+
264
+ # Get the similarity scores for the embeddings
265
+ similarities = model.similarity(embeddings, embeddings)
266
+ print(similarities.shape)
267
+ # [3, 3]
268
+ ```
269
+
270
+ <!--
271
+ ### Direct Usage (Transformers)
272
+
273
+ <details><summary>Click to see the direct usage in Transformers</summary>
274
+
275
+ </details>
276
+ -->
277
+
278
+ <!--
279
+ ### Downstream Usage (Sentence Transformers)
280
+
281
+ You can finetune this model on your own dataset.
282
+
283
+ <details><summary>Click to expand</summary>
284
+
285
+ </details>
286
+ -->
287
+
288
+ <!--
289
+ ### Out-of-Scope Use
290
+
291
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
292
+ -->
293
+
294
+ ## Evaluation
295
+
296
+ ### Metrics
297
+
298
+ #### Patient QA
299
+
300
+ * Dataset: `ontada-test`
301
+ * Evaluated with [<code>PatientQAEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.PatientQAEvaluator)
302
+
303
+ | Metric | Value |
304
+ |:--------------------|:-----------|
305
+ | cosine_accuracy@1 | 0.6856 |
306
+ | cosine_accuracy@3 | 0.9531 |
307
+ | cosine_accuracy@5 | 0.9909 |
308
+ | cosine_accuracy@10 | 1.0 |
309
+ | cosine_precision@1 | 0.6856 |
310
+ | cosine_precision@3 | 0.5209 |
311
+ | cosine_precision@5 | 0.3969 |
312
+ | cosine_precision@10 | 0.2251 |
313
+ | cosine_recall@1 | 0.4203 |
314
+ | cosine_recall@3 | 0.8154 |
315
+ | cosine_recall@5 | 0.9454 |
316
+ | cosine_recall@10 | 1.0046 |
317
+ | **cosine_ndcg@10** | **0.8649** |
318
+ | cosine_mrr@10 | 0.8191 |
319
+ | cosine_map@100 | 0.805 |
320
+
321
+ <!--
322
+ ## Bias, Risks and Limitations
323
+
324
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
325
+ -->
326
+
327
+ <!--
328
+ ### Recommendations
329
+
330
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
331
+ -->
332
+
333
+ ## Training Details
334
+
335
+ ### Training Dataset
336
+
337
+ #### Unnamed Dataset
338
+
339
+
340
+ * Size: 16,186 training samples
341
+ * Columns: <code>question</code> and <code>context</code>
342
+ * Approximate statistics based on the first 1000 samples:
343
+ | | question | context |
344
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
345
+ | type | string | string |
346
+ | details | <ul><li>min: 25 tokens</li><li>mean: 30.78 tokens</li><li>max: 39 tokens</li></ul> | <ul><li>min: 74 tokens</li><li>mean: 177.84 tokens</li><li>max: 398 tokens</li></ul> |
347
+ * Samples:
348
+ | question | context |
349
+ |:------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
350
+ | <code>Instruct: Given a question, retrieve passages that answer the question. Query: what was the abnormality identified for BRAF?</code> | <code>Source: Genetic_Testing. Date: 2022-10-07. Context: Mutational Seq DNA-Tumor Low, 6 mt/Mb NF1<br>Seq DNA-Tumor Mutation Not Detected<br>T In Not D<br>ARID2 Seq DNA-Tumor Mutation Not Detected CNA-Seq DNA-Tumor Deletion Not Detected<br> PTEN<br>Seq RNA-Tumor Fusion Not Detected Seq DNA-Tumor Mutation Not Detected<br>BRAF <br> Amplification Not _<br>CNA-Seq DNA-Tumor Detected RAC1 Seq DNA-Tumor Mutation Not Detected<br>The selection of any, all, or none of the matched therapies </code> |
351
+ | <code>Instruct: Given a question, retrieve passages that answer the question. Query: what was the abnormality identified for BRAF?</code> | <code>Source: Genetic_Testing. Date: 2021-06-04. Context: characteristics have been determined by _____ ___________<br>_______ _________ ___ ____ __________. It has not been<br>cleared or approved by FDA. This assay has been validated<br>pursuant to the CLIA regulations and is used for clinical<br>purposes.<br>BRAF MUTATION ANALYSIS E<br>SOURCE: LYMPH NODE<br>PARAFFIN BLOCK NUMBER: ____-_______ A4<br>BRAF MUTATION ANALYSIS NOT DETECTED NOT DETECTED<br>This result was reviewed and interpreted by _. ____, M.D.<br>Based on Sanger sequencing analysis, no mutations </code> |
352
+ | <code>Instruct: Given a question, retrieve passages that answer the question. Query: what was the abnormality identified for BRAF?</code> | <code>Source: Pathology. Date: 2019-12-12. Context: Receive Date: 12/12/2019<br>___ _: ________________ Accession Date: 12/12/2019<br>Copy To: Report Date: 12/19/2019 18:16<br>***SUPPLEMENTAL REPORT***<br>(previous report date: 12/19/2019)<br>BRAF SNAPSHOT<br>Results:<br>POSITIVE<br>Interpretation:<br>A BRAF mutation was detected in the provided specimen.<br>FDA has approved TKI inhibitor vemurafenib and dabrafenib for the first-line treatment of patients with<br>unresectable or metastatic melanoma whose tumors have a BRAF V600E mutation, and trametinib for tumors<br></code> |
353
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
354
+ ```json
355
+ {
356
+ "scale": 20.0,
357
+ "similarity_fct": "cos_sim"
358
+ }
359
+ ```
360
+
361
+ ### Training Hyperparameters
362
+ #### Non-Default Hyperparameters
363
+
364
+ - `eval_strategy`: steps
365
+ - `per_device_train_batch_size`: 4
366
+ - `per_device_eval_batch_size`: 64
367
+ - `learning_rate`: 2e-05
368
+ - `num_train_epochs`: 1
369
+ - `warmup_ratio`: 0.1
370
+ - `seed`: 6789
371
+ - `bf16`: True
372
+ - `prompts`: {'question': 'Instruct: Given a question, retrieve passages that answer the question. Query: '}
373
+ - `batch_sampler`: no_duplicates
374
+
375
+ #### All Hyperparameters
376
+ <details><summary>Click to expand</summary>
377
+
378
+ - `overwrite_output_dir`: False
379
+ - `do_predict`: False
380
+ - `eval_strategy`: steps
381
+ - `prediction_loss_only`: True
382
+ - `per_device_train_batch_size`: 4
383
+ - `per_device_eval_batch_size`: 64
384
+ - `per_gpu_train_batch_size`: None
385
+ - `per_gpu_eval_batch_size`: None
386
+ - `gradient_accumulation_steps`: 1
387
+ - `eval_accumulation_steps`: None
388
+ - `torch_empty_cache_steps`: None
389
+ - `learning_rate`: 2e-05
390
+ - `weight_decay`: 0.0
391
+ - `adam_beta1`: 0.9
392
+ - `adam_beta2`: 0.999
393
+ - `adam_epsilon`: 1e-08
394
+ - `max_grad_norm`: 1.0
395
+ - `num_train_epochs`: 1
396
+ - `max_steps`: -1
397
+ - `lr_scheduler_type`: linear
398
+ - `lr_scheduler_kwargs`: {}
399
+ - `warmup_ratio`: 0.1
400
+ - `warmup_steps`: 0
401
+ - `log_level`: passive
402
+ - `log_level_replica`: warning
403
+ - `log_on_each_node`: True
404
+ - `logging_nan_inf_filter`: True
405
+ - `save_safetensors`: True
406
+ - `save_on_each_node`: False
407
+ - `save_only_model`: False
408
+ - `restore_callback_states_from_checkpoint`: False
409
+ - `no_cuda`: False
410
+ - `use_cpu`: False
411
+ - `use_mps_device`: False
412
+ - `seed`: 6789
413
+ - `data_seed`: None
414
+ - `jit_mode_eval`: False
415
+ - `use_ipex`: False
416
+ - `bf16`: True
417
+ - `fp16`: False
418
+ - `fp16_opt_level`: O1
419
+ - `half_precision_backend`: auto
420
+ - `bf16_full_eval`: False
421
+ - `fp16_full_eval`: False
422
+ - `tf32`: None
423
+ - `local_rank`: 0
424
+ - `ddp_backend`: None
425
+ - `tpu_num_cores`: None
426
+ - `tpu_metrics_debug`: False
427
+ - `debug`: []
428
+ - `dataloader_drop_last`: False
429
+ - `dataloader_num_workers`: 0
430
+ - `dataloader_prefetch_factor`: None
431
+ - `past_index`: -1
432
+ - `disable_tqdm`: False
433
+ - `remove_unused_columns`: True
434
+ - `label_names`: None
435
+ - `load_best_model_at_end`: False
436
+ - `ignore_data_skip`: False
437
+ - `fsdp`: []
438
+ - `fsdp_min_num_params`: 0
439
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
440
+ - `fsdp_transformer_layer_cls_to_wrap`: None
441
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
442
+ - `deepspeed`: None
443
+ - `label_smoothing_factor`: 0.0
444
+ - `optim`: adamw_torch
445
+ - `optim_args`: None
446
+ - `adafactor`: False
447
+ - `group_by_length`: False
448
+ - `length_column_name`: length
449
+ - `ddp_find_unused_parameters`: None
450
+ - `ddp_bucket_cap_mb`: None
451
+ - `ddp_broadcast_buffers`: False
452
+ - `dataloader_pin_memory`: True
453
+ - `dataloader_persistent_workers`: False
454
+ - `skip_memory_metrics`: True
455
+ - `use_legacy_prediction_loop`: False
456
+ - `push_to_hub`: False
457
+ - `resume_from_checkpoint`: None
458
+ - `hub_model_id`: None
459
+ - `hub_strategy`: every_save
460
+ - `hub_private_repo`: False
461
+ - `hub_always_push`: False
462
+ - `gradient_checkpointing`: False
463
+ - `gradient_checkpointing_kwargs`: None
464
+ - `include_inputs_for_metrics`: False
465
+ - `include_for_metrics`: []
466
+ - `eval_do_concat_batches`: True
467
+ - `fp16_backend`: auto
468
+ - `push_to_hub_model_id`: None
469
+ - `push_to_hub_organization`: None
470
+ - `mp_parameters`:
471
+ - `auto_find_batch_size`: False
472
+ - `full_determinism`: False
473
+ - `torchdynamo`: None
474
+ - `ray_scope`: last
475
+ - `ddp_timeout`: 1800
476
+ - `torch_compile`: False
477
+ - `torch_compile_backend`: None
478
+ - `torch_compile_mode`: None
479
+ - `dispatch_batches`: None
480
+ - `split_batches`: None
481
+ - `include_tokens_per_second`: False
482
+ - `include_num_input_tokens_seen`: False
483
+ - `neftune_noise_alpha`: None
484
+ - `optim_target_modules`: None
485
+ - `batch_eval_metrics`: False
486
+ - `eval_on_start`: False
487
+ - `use_liger_kernel`: False
488
+ - `eval_use_gather_object`: False
489
+ - `prompts`: {'question': 'Instruct: Given a question, retrieve passages that answer the question. Query: '}
490
+ - `batch_sampler`: no_duplicates
491
+ - `multi_dataset_batch_sampler`: proportional
492
+
493
+ </details>
494
+
495
+ ### Training Logs
496
+ | Epoch | Step | Training Loss | ontada-test_cosine_ndcg@10 |
497
+ |:------:|:----:|:-------------:|:--------------------------:|
498
+ | 0 | 0 | - | 0.8431 |
499
+ | 0.0002 | 1 | 1.5826 | - |
500
+ | 0.0371 | 150 | 0.4123 | - |
501
+ | 0.0741 | 300 | 0.3077 | - |
502
+ | 0.1112 | 450 | 0.2184 | - |
503
+ | 0.1483 | 600 | 0.3291 | - |
504
+ | 0.1853 | 750 | 0.2343 | - |
505
+ | 0.2224 | 900 | 0.2506 | - |
506
+ | 0.2471 | 1000 | - | 0.8077 |
507
+ | 0.2595 | 1050 | 0.1294 | - |
508
+ | 0.2965 | 1200 | 0.0158 | - |
509
+ | 0.3336 | 1350 | 0.0189 | - |
510
+ | 0.3706 | 1500 | 0.0363 | - |
511
+ | 0.4077 | 1650 | 0.0208 | - |
512
+ | 0.4448 | 1800 | 0.475 | - |
513
+ | 0.4818 | 1950 | 0.6183 | - |
514
+ | 0.4942 | 2000 | - | 0.8482 |
515
+ | 0.5189 | 2100 | 0.4779 | - |
516
+ | 0.5560 | 2250 | 0.4194 | - |
517
+ | 0.5930 | 2400 | 0.8376 | - |
518
+ | 0.6301 | 2550 | 0.4249 | - |
519
+ | 0.6672 | 2700 | 0.9336 | - |
520
+ | 0.7042 | 2850 | 0.5351 | - |
521
+ | 0.7413 | 3000 | 1.0253 | 0.8551 |
522
+ | 0.7784 | 3150 | 0.3961 | - |
523
+ | 0.8154 | 3300 | 0.3881 | - |
524
+ | 0.8525 | 3450 | 0.5573 | - |
525
+ | 0.8895 | 3600 | 1.222 | - |
526
+ | 0.9266 | 3750 | 0.3032 | - |
527
+ | 0.9637 | 3900 | 0.3142 | - |
528
+ | 0.9884 | 4000 | - | 0.8645 |
529
+ | 1.0 | 4047 | - | 0.8649 |
530
+
531
+
532
+ ### Framework Versions
533
+ - Python: 3.11.10
534
+ - Sentence Transformers: 3.4.0.dev0
535
+ - Transformers: 4.46.0
536
+ - PyTorch: 2.3.1+cu121
537
+ - Accelerate: 1.0.1
538
+ - Datasets: 3.0.1
539
+ - Tokenizers: 0.20.1
540
+
541
+ ## Citation
542
+
543
+ ### BibTeX
544
+
545
+ #### Sentence Transformers
546
+ ```bibtex
547
+ @inproceedings{reimers-2019-sentence-bert,
548
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
549
+ author = "Reimers, Nils and Gurevych, Iryna",
550
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
551
+ month = "11",
552
+ year = "2019",
553
+ publisher = "Association for Computational Linguistics",
554
+ url = "https://arxiv.org/abs/1908.10084",
555
+ }
556
+ ```
557
+
558
+ #### MultipleNegativesRankingLoss
559
+ ```bibtex
560
+ @misc{henderson2017efficient,
561
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
562
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
563
+ year={2017},
564
+ eprint={1705.00652},
565
+ archivePrefix={arXiv},
566
+ primaryClass={cs.CL}
567
+ }
568
+ ```
569
+
570
+ <!--
571
+ ## Glossary
572
+
573
+ *Clearly define terms in order to be accessible across audiences.*
574
+ -->
575
+
576
+ <!--
577
+ ## Model Card Authors
578
+
579
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
580
+ -->
581
+
582
+ <!--
583
+ ## Model Card Contact
584
+
585
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
586
+ -->
config.json ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/workspace/data/june/sentence-transformers/outputs/2024-11-18/19-28-23/models/nv-embed-v2-ontada-twab-peft/final/",
3
+ "add_eos": true,
4
+ "add_pad_token": true,
5
+ "architectures": [
6
+ "NVEmbedModel"
7
+ ],
8
+ "auto_map": {
9
+ "AutoConfig": "configuration_nvembed.NVEmbedConfig",
10
+ "AutoModel": "modeling_nvembed.NVEmbedModel"
11
+ },
12
+ "hidden_size": 4096,
13
+ "is_mask_instruction": true,
14
+ "latent_attention_config": {
15
+ "model_type": "latent_attention"
16
+ },
17
+ "mask_type": "b",
18
+ "model_type": "nvembed",
19
+ "padding_side": "right",
20
+ "text_config": {
21
+ "_attn_implementation_autoset": false,
22
+ "_name_or_path": "nvidia/NV-Embed-v2",
23
+ "add_cross_attention": false,
24
+ "architectures": [
25
+ "MistralModel"
26
+ ],
27
+ "attention_dropout": 0.0,
28
+ "bad_words_ids": null,
29
+ "begin_suppress_tokens": null,
30
+ "bos_token_id": 1,
31
+ "chunk_size_feed_forward": 0,
32
+ "cross_attention_hidden_size": null,
33
+ "decoder_start_token_id": null,
34
+ "diversity_penalty": 0.0,
35
+ "do_sample": false,
36
+ "early_stopping": false,
37
+ "encoder_no_repeat_ngram_size": 0,
38
+ "eos_token_id": 2,
39
+ "exponential_decay_length_penalty": null,
40
+ "finetuning_task": null,
41
+ "forced_bos_token_id": null,
42
+ "forced_eos_token_id": null,
43
+ "head_dim": 128,
44
+ "hidden_act": "silu",
45
+ "hidden_size": 4096,
46
+ "id2label": {
47
+ "0": "LABEL_0",
48
+ "1": "LABEL_1"
49
+ },
50
+ "initializer_range": 0.02,
51
+ "intermediate_size": 14336,
52
+ "is_decoder": false,
53
+ "is_encoder_decoder": false,
54
+ "label2id": {
55
+ "LABEL_0": 0,
56
+ "LABEL_1": 1
57
+ },
58
+ "length_penalty": 1.0,
59
+ "max_length": 20,
60
+ "max_position_embeddings": 32768,
61
+ "min_length": 0,
62
+ "model_type": "bidir_mistral",
63
+ "no_repeat_ngram_size": 0,
64
+ "num_attention_heads": 32,
65
+ "num_beam_groups": 1,
66
+ "num_beams": 1,
67
+ "num_hidden_layers": 32,
68
+ "num_key_value_heads": 8,
69
+ "num_return_sequences": 1,
70
+ "output_attentions": false,
71
+ "output_hidden_states": false,
72
+ "output_scores": false,
73
+ "pad_token_id": null,
74
+ "prefix": null,
75
+ "problem_type": null,
76
+ "pruned_heads": {},
77
+ "remove_invalid_values": false,
78
+ "repetition_penalty": 1.0,
79
+ "return_dict": true,
80
+ "return_dict_in_generate": false,
81
+ "rms_norm_eps": 1e-05,
82
+ "rope_theta": 10000.0,
83
+ "sep_token_id": null,
84
+ "sliding_window": 4096,
85
+ "suppress_tokens": null,
86
+ "task_specific_params": null,
87
+ "temperature": 1.0,
88
+ "tf_legacy_loss": false,
89
+ "tie_encoder_decoder": false,
90
+ "tie_word_embeddings": false,
91
+ "tokenizer_class": null,
92
+ "top_k": 50,
93
+ "top_p": 1.0,
94
+ "torch_dtype": "float32",
95
+ "torchscript": false,
96
+ "typical_p": 1.0,
97
+ "use_bfloat16": false,
98
+ "use_cache": true,
99
+ "vocab_size": 32000
100
+ },
101
+ "torch_dtype": "float32",
102
+ "transformers_version": "4.46.0"
103
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.0.dev0",
4
+ "transformers": "4.46.0",
5
+ "pytorch": "2.3.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
configuration_nvembed.py ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ from typing import Literal
3
+ from transformers import AutoConfig
4
+ from transformers.configuration_utils import PretrainedConfig
5
+ from transformers.models.auto import CONFIG_MAPPING
6
+ from transformers.models.mistral import MistralConfig
7
+
8
+ NVEMBED_TYPE = "nvembed"
9
+ LATENT_ATTENTION_TYPE = "latent_attention"
10
+ BIDIR_MISTRAL_TYPE = "bidir_mistral"
11
+
12
+ class NVEmbedConfig(PretrainedConfig):
13
+ model_type = "nvembed"
14
+ is_composition = False
15
+
16
+ def __init__(
17
+ self,
18
+ latent_attention_config=None,
19
+ text_config=None,
20
+ padding_side: Literal["right", "left"]="right",
21
+ add_pad_token: bool=True,
22
+ is_mask_instruction: bool = True,
23
+ add_eos: bool=True,
24
+ mask_type: str="b",
25
+ **kwargs,
26
+ ):
27
+ if isinstance(latent_attention_config, dict):
28
+ latent_attention_config["model_type"] = (
29
+ latent_attention_config["model_type"] if "model_type" in latent_attention_config else LATENT_ATTENTION_TYPE
30
+ )
31
+ latent_attention_config = CONFIG_MAPPING[latent_attention_config["model_type"]](**latent_attention_config)
32
+ elif latent_attention_config is None:
33
+ latent_attention_config = CONFIG_MAPPING[LATENT_ATTENTION_TYPE]()
34
+
35
+ self.latent_attention_config = latent_attention_config
36
+
37
+ if isinstance(text_config, dict):
38
+ text_config["model_type"] = text_config["model_type"] if "model_type" in text_config else "llama"
39
+ text_config = CONFIG_MAPPING[text_config["model_type"]](**text_config)
40
+ elif text_config is None:
41
+ text_config = None
42
+
43
+ self.text_config = text_config
44
+ self.padding_side = padding_side
45
+ self.is_mask_instruction = is_mask_instruction
46
+ self.add_pad_token = add_pad_token
47
+ self.add_eos = add_eos
48
+ self.mask_type = mask_type
49
+ if "hidden_size" in kwargs:
50
+ self.hidden_size = kwargs["hidden_size"]
51
+ else:
52
+ self.hidden_size = 4096
53
+
54
+ super().__init__(**kwargs)
55
+
56
+
57
+ class LatentAttentionConfig(PretrainedConfig):
58
+ model_type = LATENT_ATTENTION_TYPE
59
+ is_composition = False
60
+ _name_or_path = "latent_attention"
61
+
62
+ def __init__(
63
+ self,
64
+ num_latents_value: int=512,
65
+ num_cross_heads: int=8,
66
+ output_normalize: bool=True,
67
+ hidden_dim: int=4096,
68
+ latent_dim: int=4096,
69
+ cross_dim_head: int=4096,
70
+ **kwargs,
71
+ ):
72
+ self.num_latents_value = num_latents_value
73
+ self.num_cross_heads = num_cross_heads
74
+ self.output_normalize = output_normalize
75
+ self.hidden_dim = hidden_dim
76
+ self.latent_dim = latent_dim
77
+ self.cross_dim_head = cross_dim_head
78
+
79
+
80
+ class BidirectionalMistralConfig(MistralConfig):
81
+ model_type = BIDIR_MISTRAL_TYPE
82
+ keys_to_ignore_at_inference = ["past_key_values"]
83
+
84
+ AutoConfig.register(NVEMBED_TYPE, NVEmbedConfig)
85
+ AutoConfig.register(LATENT_ATTENTION_TYPE, LatentAttentionConfig)
86
+ AutoConfig.register(BIDIR_MISTRAL_TYPE, BidirectionalMistralConfig)
87
+
88
+ NVEmbedConfig.register_for_auto_class()
89
+ LatentAttentionConfig.register_for_auto_class()
90
+ BidirectionalMistralConfig.register_for_auto_class()
model-00001-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9da5d3a0f4722c5aaec4251748f9c531c07da032cf9ccac44af75e76862b1005
3
+ size 4995698456
model-00002-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb80eefa9ae938158283d57b41011cfe7dedad39d28eb5b3d5757e6fb662185a
3
+ size 4999813600
model-00003-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:86a3be3f0deb8e186c216b75a1a31cb3547c4007e9488aaba139e69b0c687573
3
+ size 4999813624
model-00004-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:60dfb9d521133071297fe06f0436fd4abe5da4f8c8e545dcad083579ff957944
3
+ size 4832007968
model-00005-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5b22269d0eb46b0e767871becd03a0d72eeb5d577508cb93699c4c513c5919ab
3
+ size 4999813656
model-00006-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:01e3c07a33f15646debcffcfb5d8fead6f9d134677e7d49623bbb8c08b7a8a56
3
+ size 4999813656
model-00007-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:335f0defa0fe29f91969a1b8cd7cd15c5fe68b0129f99342f5d9d7096b6c06b8
3
+ size 1577142096
model.safetensors.index.json ADDED
@@ -0,0 +1,311 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 31404064768
4
+ },
5
+ "weight_map": {
6
+ "embedding_model.embed_tokens.weight": "model-00001-of-00007.safetensors",
7
+ "embedding_model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
8
+ "embedding_model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
9
+ "embedding_model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
10
+ "embedding_model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
11
+ "embedding_model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
12
+ "embedding_model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
13
+ "embedding_model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
14
+ "embedding_model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
15
+ "embedding_model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
16
+ "embedding_model.layers.1.input_layernorm.weight": "model-00002-of-00007.safetensors",
17
+ "embedding_model.layers.1.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
18
+ "embedding_model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
19
+ "embedding_model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
20
+ "embedding_model.layers.1.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
21
+ "embedding_model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
22
+ "embedding_model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
23
+ "embedding_model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
24
+ "embedding_model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
25
+ "embedding_model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
26
+ "embedding_model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
27
+ "embedding_model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
28
+ "embedding_model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
29
+ "embedding_model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
30
+ "embedding_model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
31
+ "embedding_model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
32
+ "embedding_model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
33
+ "embedding_model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
34
+ "embedding_model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
35
+ "embedding_model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
36
+ "embedding_model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
37
+ "embedding_model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
38
+ "embedding_model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
39
+ "embedding_model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
40
+ "embedding_model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
41
+ "embedding_model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
42
+ "embedding_model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
43
+ "embedding_model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
44
+ "embedding_model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
45
+ "embedding_model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
46
+ "embedding_model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
47
+ "embedding_model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
48
+ "embedding_model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
49
+ "embedding_model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
50
+ "embedding_model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
51
+ "embedding_model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
52
+ "embedding_model.layers.13.input_layernorm.weight": "model-00004-of-00007.safetensors",
53
+ "embedding_model.layers.13.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
54
+ "embedding_model.layers.13.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
55
+ "embedding_model.layers.13.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
56
+ "embedding_model.layers.13.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
57
+ "embedding_model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
58
+ "embedding_model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
59
+ "embedding_model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
60
+ "embedding_model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
61
+ "embedding_model.layers.14.input_layernorm.weight": "model-00004-of-00007.safetensors",
62
+ "embedding_model.layers.14.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
63
+ "embedding_model.layers.14.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
64
+ "embedding_model.layers.14.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
65
+ "embedding_model.layers.14.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
66
+ "embedding_model.layers.14.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
67
+ "embedding_model.layers.14.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
68
+ "embedding_model.layers.14.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
69
+ "embedding_model.layers.14.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
70
+ "embedding_model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
71
+ "embedding_model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
72
+ "embedding_model.layers.15.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
73
+ "embedding_model.layers.15.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
74
+ "embedding_model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
75
+ "embedding_model.layers.15.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
76
+ "embedding_model.layers.15.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
77
+ "embedding_model.layers.15.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
78
+ "embedding_model.layers.15.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
79
+ "embedding_model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
80
+ "embedding_model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
81
+ "embedding_model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
82
+ "embedding_model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
83
+ "embedding_model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
84
+ "embedding_model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
85
+ "embedding_model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
86
+ "embedding_model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
87
+ "embedding_model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
88
+ "embedding_model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
89
+ "embedding_model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
90
+ "embedding_model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
91
+ "embedding_model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
92
+ "embedding_model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
93
+ "embedding_model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
94
+ "embedding_model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
95
+ "embedding_model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
96
+ "embedding_model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
97
+ "embedding_model.layers.18.input_layernorm.weight": "model-00005-of-00007.safetensors",
98
+ "embedding_model.layers.18.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
99
+ "embedding_model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
100
+ "embedding_model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
101
+ "embedding_model.layers.18.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
102
+ "embedding_model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
103
+ "embedding_model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
104
+ "embedding_model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
105
+ "embedding_model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
106
+ "embedding_model.layers.19.input_layernorm.weight": "model-00005-of-00007.safetensors",
107
+ "embedding_model.layers.19.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
108
+ "embedding_model.layers.19.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
109
+ "embedding_model.layers.19.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
110
+ "embedding_model.layers.19.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
111
+ "embedding_model.layers.19.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
112
+ "embedding_model.layers.19.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
113
+ "embedding_model.layers.19.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
114
+ "embedding_model.layers.19.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
115
+ "embedding_model.layers.2.input_layernorm.weight": "model-00002-of-00007.safetensors",
116
+ "embedding_model.layers.2.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
117
+ "embedding_model.layers.2.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
118
+ "embedding_model.layers.2.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
119
+ "embedding_model.layers.2.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
120
+ "embedding_model.layers.2.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
121
+ "embedding_model.layers.2.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
122
+ "embedding_model.layers.2.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
123
+ "embedding_model.layers.2.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
124
+ "embedding_model.layers.20.input_layernorm.weight": "model-00005-of-00007.safetensors",
125
+ "embedding_model.layers.20.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
126
+ "embedding_model.layers.20.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
127
+ "embedding_model.layers.20.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
128
+ "embedding_model.layers.20.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
129
+ "embedding_model.layers.20.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
130
+ "embedding_model.layers.20.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
131
+ "embedding_model.layers.20.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
132
+ "embedding_model.layers.20.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
133
+ "embedding_model.layers.21.input_layernorm.weight": "model-00005-of-00007.safetensors",
134
+ "embedding_model.layers.21.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
135
+ "embedding_model.layers.21.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
136
+ "embedding_model.layers.21.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
137
+ "embedding_model.layers.21.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
138
+ "embedding_model.layers.21.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
139
+ "embedding_model.layers.21.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
140
+ "embedding_model.layers.21.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
141
+ "embedding_model.layers.21.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
142
+ "embedding_model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
143
+ "embedding_model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
144
+ "embedding_model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
145
+ "embedding_model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
146
+ "embedding_model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
147
+ "embedding_model.layers.22.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
148
+ "embedding_model.layers.22.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
149
+ "embedding_model.layers.22.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
150
+ "embedding_model.layers.22.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
151
+ "embedding_model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
152
+ "embedding_model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
153
+ "embedding_model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
154
+ "embedding_model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
155
+ "embedding_model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
156
+ "embedding_model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
157
+ "embedding_model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
158
+ "embedding_model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
159
+ "embedding_model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
160
+ "embedding_model.layers.24.input_layernorm.weight": "model-00006-of-00007.safetensors",
161
+ "embedding_model.layers.24.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
162
+ "embedding_model.layers.24.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
163
+ "embedding_model.layers.24.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
164
+ "embedding_model.layers.24.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
165
+ "embedding_model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
166
+ "embedding_model.layers.24.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
167
+ "embedding_model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
168
+ "embedding_model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
169
+ "embedding_model.layers.25.input_layernorm.weight": "model-00006-of-00007.safetensors",
170
+ "embedding_model.layers.25.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
171
+ "embedding_model.layers.25.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
172
+ "embedding_model.layers.25.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
173
+ "embedding_model.layers.25.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
174
+ "embedding_model.layers.25.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
175
+ "embedding_model.layers.25.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
176
+ "embedding_model.layers.25.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
177
+ "embedding_model.layers.25.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
178
+ "embedding_model.layers.26.input_layernorm.weight": "model-00006-of-00007.safetensors",
179
+ "embedding_model.layers.26.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
180
+ "embedding_model.layers.26.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
181
+ "embedding_model.layers.26.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
182
+ "embedding_model.layers.26.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
183
+ "embedding_model.layers.26.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
184
+ "embedding_model.layers.26.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
185
+ "embedding_model.layers.26.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
186
+ "embedding_model.layers.26.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
187
+ "embedding_model.layers.27.input_layernorm.weight": "model-00006-of-00007.safetensors",
188
+ "embedding_model.layers.27.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
189
+ "embedding_model.layers.27.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
190
+ "embedding_model.layers.27.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
191
+ "embedding_model.layers.27.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
192
+ "embedding_model.layers.27.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
193
+ "embedding_model.layers.27.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
194
+ "embedding_model.layers.27.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
195
+ "embedding_model.layers.27.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
196
+ "embedding_model.layers.28.input_layernorm.weight": "model-00006-of-00007.safetensors",
197
+ "embedding_model.layers.28.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
198
+ "embedding_model.layers.28.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
199
+ "embedding_model.layers.28.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
200
+ "embedding_model.layers.28.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
201
+ "embedding_model.layers.28.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
202
+ "embedding_model.layers.28.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
203
+ "embedding_model.layers.28.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
204
+ "embedding_model.layers.28.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
205
+ "embedding_model.layers.29.input_layernorm.weight": "model-00006-of-00007.safetensors",
206
+ "embedding_model.layers.29.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
207
+ "embedding_model.layers.29.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
208
+ "embedding_model.layers.29.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
209
+ "embedding_model.layers.29.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
210
+ "embedding_model.layers.29.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
211
+ "embedding_model.layers.29.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
212
+ "embedding_model.layers.29.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
213
+ "embedding_model.layers.29.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
214
+ "embedding_model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
215
+ "embedding_model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
216
+ "embedding_model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
217
+ "embedding_model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
218
+ "embedding_model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
219
+ "embedding_model.layers.3.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
220
+ "embedding_model.layers.3.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
221
+ "embedding_model.layers.3.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
222
+ "embedding_model.layers.3.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
223
+ "embedding_model.layers.30.input_layernorm.weight": "model-00007-of-00007.safetensors",
224
+ "embedding_model.layers.30.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
225
+ "embedding_model.layers.30.mlp.gate_proj.weight": "model-00007-of-00007.safetensors",
226
+ "embedding_model.layers.30.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
227
+ "embedding_model.layers.30.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
228
+ "embedding_model.layers.30.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
229
+ "embedding_model.layers.30.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
230
+ "embedding_model.layers.30.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
231
+ "embedding_model.layers.30.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
232
+ "embedding_model.layers.31.input_layernorm.weight": "model-00007-of-00007.safetensors",
233
+ "embedding_model.layers.31.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
234
+ "embedding_model.layers.31.mlp.gate_proj.weight": "model-00007-of-00007.safetensors",
235
+ "embedding_model.layers.31.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
236
+ "embedding_model.layers.31.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
237
+ "embedding_model.layers.31.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
238
+ "embedding_model.layers.31.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
239
+ "embedding_model.layers.31.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
240
+ "embedding_model.layers.31.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
241
+ "embedding_model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
242
+ "embedding_model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
243
+ "embedding_model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
244
+ "embedding_model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
245
+ "embedding_model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
246
+ "embedding_model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
247
+ "embedding_model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
248
+ "embedding_model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
249
+ "embedding_model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
250
+ "embedding_model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
251
+ "embedding_model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
252
+ "embedding_model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
253
+ "embedding_model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
254
+ "embedding_model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
255
+ "embedding_model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
256
+ "embedding_model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
257
+ "embedding_model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
258
+ "embedding_model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
259
+ "embedding_model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
260
+ "embedding_model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
261
+ "embedding_model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
262
+ "embedding_model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
263
+ "embedding_model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
264
+ "embedding_model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
265
+ "embedding_model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
266
+ "embedding_model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
267
+ "embedding_model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
268
+ "embedding_model.layers.7.input_layernorm.weight": "model-00003-of-00007.safetensors",
269
+ "embedding_model.layers.7.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
270
+ "embedding_model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
271
+ "embedding_model.layers.7.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
272
+ "embedding_model.layers.7.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
273
+ "embedding_model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
274
+ "embedding_model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
275
+ "embedding_model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
276
+ "embedding_model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
277
+ "embedding_model.layers.8.input_layernorm.weight": "model-00003-of-00007.safetensors",
278
+ "embedding_model.layers.8.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
279
+ "embedding_model.layers.8.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
280
+ "embedding_model.layers.8.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
281
+ "embedding_model.layers.8.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
282
+ "embedding_model.layers.8.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
283
+ "embedding_model.layers.8.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
284
+ "embedding_model.layers.8.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
285
+ "embedding_model.layers.8.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
286
+ "embedding_model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
287
+ "embedding_model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
288
+ "embedding_model.layers.9.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
289
+ "embedding_model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
290
+ "embedding_model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
291
+ "embedding_model.layers.9.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
292
+ "embedding_model.layers.9.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
293
+ "embedding_model.layers.9.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
294
+ "embedding_model.layers.9.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
295
+ "embedding_model.norm.weight": "model-00007-of-00007.safetensors",
296
+ "latent_attention_model.cross_attend_blocks.0.fn.to_kv.weight": "model-00001-of-00007.safetensors",
297
+ "latent_attention_model.cross_attend_blocks.0.fn.to_out.weight": "model-00001-of-00007.safetensors",
298
+ "latent_attention_model.cross_attend_blocks.0.fn.to_q.weight": "model-00001-of-00007.safetensors",
299
+ "latent_attention_model.cross_attend_blocks.0.norm.bias": "model-00001-of-00007.safetensors",
300
+ "latent_attention_model.cross_attend_blocks.0.norm.weight": "model-00001-of-00007.safetensors",
301
+ "latent_attention_model.cross_attend_blocks.0.norm_context.bias": "model-00001-of-00007.safetensors",
302
+ "latent_attention_model.cross_attend_blocks.0.norm_context.weight": "model-00001-of-00007.safetensors",
303
+ "latent_attention_model.cross_attend_blocks.1.fn.net.0.bias": "model-00001-of-00007.safetensors",
304
+ "latent_attention_model.cross_attend_blocks.1.fn.net.0.weight": "model-00001-of-00007.safetensors",
305
+ "latent_attention_model.cross_attend_blocks.1.fn.net.2.bias": "model-00001-of-00007.safetensors",
306
+ "latent_attention_model.cross_attend_blocks.1.fn.net.2.weight": "model-00001-of-00007.safetensors",
307
+ "latent_attention_model.cross_attend_blocks.1.norm.bias": "model-00001-of-00007.safetensors",
308
+ "latent_attention_model.cross_attend_blocks.1.norm.weight": "model-00001-of-00007.safetensors",
309
+ "latent_attention_model.latents": "model-00001-of-00007.safetensors"
310
+ }
311
+ }
modeling_nvembed.py ADDED
@@ -0,0 +1,441 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List, Union, Dict, Mapping, Optional, Tuple, TypedDict
2
+ import torch
3
+ import os
4
+ import json
5
+ import numpy as np
6
+ from functools import partial
7
+ from contextlib import nullcontext
8
+ from transformers import AutoModel, PreTrainedTokenizerFast, BatchEncoding, DataCollatorWithPadding
9
+ from transformers.modeling_utils import PreTrainedModel
10
+ from transformers.models.auto import AutoTokenizer
11
+ from transformers.models.mistral.modeling_mistral import MISTRAL_INPUTS_DOCSTRING
12
+ from transformers.modeling_outputs import BaseModelOutputWithPast
13
+ from transformers.modeling_attn_mask_utils import _prepare_4d_attention_mask, _prepare_4d_attention_mask_for_sdpa
14
+ from transformers import MistralModel, MistralConfig
15
+ from transformers.cache_utils import Cache, DynamicCache
16
+ from transformers.utils import (
17
+ add_start_docstrings_to_model_forward,
18
+ logging,
19
+ )
20
+ from einops import rearrange, repeat
21
+ from tqdm.auto import tqdm
22
+ from datasets import Dataset
23
+ from torch.utils.data import DataLoader
24
+ from .configuration_nvembed import NVEmbedConfig, LatentAttentionConfig, BidirectionalMistralConfig
25
+
26
+ logger = logging.get_logger(__name__)
27
+
28
+ class NVEmbedFeatures(TypedDict):
29
+ input_dict: torch.Tensor
30
+ attention_mask: torch.Tensor
31
+ pool_mask: torch.Tensor
32
+
33
+ class BidirectionalMistralModel(MistralModel):
34
+ config_class = BidirectionalMistralConfig
35
+
36
+ def __init__(self, config: MistralConfig):
37
+ super().__init__(config)
38
+ for layer in self.layers:
39
+ layer.self_attn.is_causal = False
40
+ self._attn_implementation = "eager"
41
+
42
+ @add_start_docstrings_to_model_forward(MISTRAL_INPUTS_DOCSTRING)
43
+ def forward(
44
+ self,
45
+ input_ids: torch.LongTensor = None,
46
+ attention_mask: Optional[torch.Tensor] = None,
47
+ position_ids: Optional[torch.LongTensor] = None,
48
+ past_key_values: Optional[List[torch.FloatTensor]] = None,
49
+ inputs_embeds: Optional[torch.FloatTensor] = None,
50
+ use_cache: Optional[bool] = None,
51
+ output_attentions: Optional[bool] = None,
52
+ output_hidden_states: Optional[bool] = None,
53
+ return_dict: Optional[bool] = None,
54
+ ) -> Union[Tuple, BaseModelOutputWithPast]:
55
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
56
+ output_hidden_states = (
57
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
58
+ )
59
+ use_cache = use_cache if use_cache is not None else self.config.use_cache
60
+
61
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
62
+
63
+ # retrieve input_ids and inputs_embeds
64
+ if input_ids is not None and inputs_embeds is not None:
65
+ raise ValueError("You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time")
66
+ elif input_ids is not None:
67
+ batch_size, seq_length = input_ids.shape
68
+ elif inputs_embeds is not None:
69
+ batch_size, seq_length, _ = inputs_embeds.shape
70
+ else:
71
+ raise ValueError("You have to specify either decoder_input_ids or decoder_inputs_embeds")
72
+
73
+ if self.gradient_checkpointing and self.training:
74
+ if use_cache:
75
+ logger.warning_once(
76
+ "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
77
+ )
78
+ use_cache = False
79
+
80
+ past_key_values_length = 0
81
+
82
+ if use_cache:
83
+ use_legacy_cache = not isinstance(past_key_values, Cache)
84
+ if use_legacy_cache:
85
+ past_key_values = DynamicCache.from_legacy_cache(past_key_values)
86
+ past_key_values_length = past_key_values.get_usable_length(seq_length)
87
+
88
+ if position_ids is None:
89
+ device = input_ids.device if input_ids is not None else inputs_embeds.device
90
+ position_ids = torch.arange(
91
+ past_key_values_length, seq_length + past_key_values_length, dtype=torch.long, device=device
92
+ )
93
+ position_ids = position_ids.unsqueeze(0).view(-1, seq_length)
94
+ else:
95
+ position_ids = position_ids.view(-1, seq_length).long()
96
+
97
+ if inputs_embeds is None:
98
+ inputs_embeds = self.embed_tokens(input_ids)
99
+
100
+ if attention_mask is not None and self._attn_implementation == "flash_attention_2" and use_cache:
101
+ is_padding_right = attention_mask[:, -1].sum().item() != batch_size
102
+ if is_padding_right:
103
+ raise ValueError(
104
+ "You are attempting to perform batched generation with padding_side='right'"
105
+ " this may lead to unexpected behaviour for Flash Attention version of Mistral. Make sure to "
106
+ " call `tokenizer.padding_side = 'left'` before tokenizing the input. "
107
+ )
108
+
109
+ if self._attn_implementation == "flash_attention_2":
110
+ # 2d mask is passed through the layers
111
+ attention_mask = attention_mask if (attention_mask is not None and 0 in attention_mask) else None
112
+ elif self._attn_implementation == "sdpa" and not output_attentions:
113
+ # output_attentions=True can not be supported when using SDPA, and we fall back on
114
+ # the manual implementation that requires a 4D causal mask in all cases.
115
+ attention_mask = _prepare_4d_attention_mask_for_sdpa(
116
+ attention_mask, inputs_embeds.dtype
117
+ )
118
+ else:
119
+ # 4d mask is passed through the layers
120
+ attention_mask = _prepare_4d_attention_mask(
121
+ attention_mask, inputs_embeds.dtype,
122
+ )
123
+
124
+ hidden_states = inputs_embeds
125
+
126
+ # decoder layers
127
+ all_hidden_states = () if output_hidden_states else None
128
+ all_self_attns = () if output_attentions else None
129
+ next_decoder_cache = None
130
+
131
+ for decoder_layer in self.layers:
132
+ if output_hidden_states:
133
+ all_hidden_states += (hidden_states,)
134
+
135
+ if self.gradient_checkpointing and self.training:
136
+ layer_outputs = self._gradient_checkpointing_func(
137
+ decoder_layer.__call__,
138
+ hidden_states,
139
+ attention_mask,
140
+ position_ids,
141
+ past_key_values,
142
+ output_attentions,
143
+ use_cache,
144
+ )
145
+ else:
146
+ layer_outputs = decoder_layer(
147
+ hidden_states,
148
+ attention_mask=attention_mask,
149
+ position_ids=position_ids,
150
+ past_key_value=past_key_values,
151
+ output_attentions=output_attentions,
152
+ use_cache=use_cache,
153
+ )
154
+
155
+ hidden_states = layer_outputs[0]
156
+
157
+ if use_cache:
158
+ next_decoder_cache = layer_outputs[2 if output_attentions else 1]
159
+
160
+ if output_attentions:
161
+ all_self_attns += (layer_outputs[1],)
162
+
163
+ hidden_states = self.norm(hidden_states)
164
+
165
+ # add hidden states from the last decoder layer
166
+ if output_hidden_states:
167
+ all_hidden_states += (hidden_states,)
168
+
169
+ next_cache = None
170
+ if use_cache:
171
+ next_cache = next_decoder_cache.to_legacy_cache() if use_legacy_cache else next_decoder_cache
172
+
173
+ if not return_dict:
174
+ return tuple(v for v in [hidden_states, next_cache, all_hidden_states, all_self_attns] if v is not None)
175
+ return BaseModelOutputWithPast(
176
+ last_hidden_state=hidden_states,
177
+ past_key_values=next_cache,
178
+ hidden_states=all_hidden_states,
179
+ attentions=all_self_attns,
180
+ )
181
+
182
+ def _move_to_device(maybe_tensor, device: torch.device):
183
+ if torch.is_tensor(maybe_tensor):
184
+ return maybe_tensor.to(device, non_blocking=device.type == "cuda")
185
+ elif isinstance(maybe_tensor, dict):
186
+ return {key: _move_to_device(value, device) for key, value in maybe_tensor.items()}
187
+ elif isinstance(maybe_tensor, list):
188
+ return [_move_to_device(x, device) for x in maybe_tensor]
189
+ elif isinstance(maybe_tensor, tuple):
190
+ return tuple([_move_to_device(x, device) for x in maybe_tensor])
191
+ elif isinstance(maybe_tensor, Mapping):
192
+ return type(maybe_tensor)({k: _move_to_device(v, device) for k, v in maybe_tensor.items()})
193
+ else:
194
+ return maybe_tensor
195
+
196
+ def move_to_device(sample, device: torch.device):
197
+ if device.type == "cpu":
198
+ return sample
199
+
200
+ if len(sample) == 0:
201
+ return {}
202
+ return _move_to_device(sample, device)
203
+
204
+
205
+ def input_transform_func(
206
+ tokenizer: PreTrainedTokenizerFast,
207
+ examples: Dict[str, List],
208
+ always_add_eos: bool,
209
+ max_length: int,
210
+ instruction: str,
211
+ ) -> BatchEncoding:
212
+ if always_add_eos:
213
+ examples['input_texts'] = [instruction + input_example + tokenizer.eos_token for input_example in examples['input_texts']]
214
+ batch_dict = tokenizer(
215
+ examples['input_texts'],
216
+ max_length=max_length,
217
+ padding=True,
218
+ return_token_type_ids=False,
219
+ return_tensors="pt",
220
+ truncation=True)
221
+ return batch_dict
222
+
223
+
224
+ class PreNorm(torch.nn.Module):
225
+ def __init__(self, dim, fn, context_dim = None):
226
+ super().__init__()
227
+ self.fn = fn
228
+ self.norm = torch.nn.LayerNorm(dim)
229
+ self.norm_context = torch.nn.LayerNorm(context_dim) if exists(context_dim) else None
230
+
231
+ def forward(self, x, **kwargs):
232
+ x = self.norm(x)
233
+ if exists(self.norm_context):
234
+ context = kwargs['context']
235
+ normed_context = self.norm_context(context)
236
+ kwargs.update(context = normed_context)
237
+ return self.fn(x, **kwargs)
238
+
239
+ class GEGLU(torch.nn.Module):
240
+ def forward(self, x):
241
+ x, gates = x.chunk(2, dim = -1)
242
+ return x * torch.nn.functional.gelu(gates)
243
+
244
+ class FeedForward(torch.nn.Module):
245
+ def __init__(self, dim, mult = 4):
246
+ super().__init__()
247
+ self.net = torch.nn.Sequential(torch.nn.Linear(dim, dim * mult * 2),
248
+ GEGLU(),
249
+ torch.nn.Linear(dim * mult, dim))
250
+
251
+ def forward(self, x):
252
+ return self.net(x)
253
+
254
+ def exists(val):
255
+ return val is not None
256
+
257
+ def default(val, d):
258
+ return val if exists(val) else d
259
+
260
+
261
+ class Attention(torch.nn.Module):
262
+ def __init__(self, query_dim, context_dim = None, heads = 8, dim_head = 64):
263
+ super().__init__()
264
+ inner_dim = dim_head * heads
265
+ context_dim = default(context_dim, query_dim)
266
+ self.scale = dim_head ** -0.5
267
+ self.heads = heads
268
+
269
+ self.to_q = torch.nn.Linear(query_dim, inner_dim, bias = False)
270
+ self.to_kv = torch.nn.Linear(context_dim, inner_dim * 2, bias = False)
271
+ self.to_out = torch.nn.Linear(inner_dim, query_dim, bias = False)
272
+
273
+ def forward(self, x, context = None, mask = None):
274
+ h = self.heads
275
+ q = self.to_q(x)
276
+ context = default(context, x)
277
+ k, v = self.to_kv(context).chunk(2, dim = -1)
278
+ q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> (b h) n d', h = h), (q, k, v))
279
+ with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_mem_efficient=True):
280
+ out = torch.nn.functional.scaled_dot_product_attention(q, k, v)
281
+ out = rearrange(out, '(b h) n d -> b n (h d)', h = h)
282
+ return self.to_out(out)
283
+
284
+
285
+ class LatentAttentionModel(PreTrainedModel):
286
+ config_class = LatentAttentionConfig
287
+
288
+ def __init__(self, config: LatentAttentionConfig):
289
+ super().__init__(config)
290
+ ## cross-attention block
291
+ num_latents, latent_dim, cross_heads, cross_dim_head = config.num_latents_value, config.latent_dim, config.num_cross_heads, config.cross_dim_head
292
+ dim = config.hidden_dim
293
+ # init latent_attention and latents
294
+ self.cross_attend_blocks = torch.nn.ModuleList([
295
+ PreNorm(latent_dim, Attention(latent_dim, dim, heads = cross_heads, dim_head = cross_dim_head),
296
+ context_dim = dim),
297
+ PreNorm(latent_dim, FeedForward(latent_dim)),
298
+ ])
299
+ self.output_normalize = config.output_normalize
300
+ self.register_parameter("latents", torch.nn.Parameter(torch.randn(num_latents, latent_dim)))
301
+
302
+ def forward(self, hiddens, attention_mask: torch.Tensor=None):
303
+ ## cross-attention block
304
+ cross_attn, cross_ff = self.cross_attend_blocks
305
+ b, *_, device = *hiddens.shape, hiddens.device
306
+ x = repeat(self.latents, 'n d -> b n d', b = b)
307
+ hiddens = cross_attn(hiddens, context = x, mask = None) + hiddens
308
+ hiddens = cross_ff(hiddens) + hiddens
309
+ if attention_mask !=None:
310
+ s = torch.sum(hiddens * attention_mask.unsqueeze(-1).float(), dim=1)
311
+ d = attention_mask.sum(dim=1, keepdim=True).float()
312
+ hiddens = s / d
313
+ if self.output_normalize:
314
+ hiddens = torch.nn.functional.normalize(hiddens, p=2, dim=-1)
315
+ return hiddens
316
+
317
+ class NVEmbedModel(PreTrainedModel):
318
+ config_class = NVEmbedConfig
319
+ _no_split_modules = ["MistralDecoderLayer", "LatentAttentionModel"]
320
+
321
+ def __init__(self, config: NVEmbedConfig):
322
+ super().__init__(config)
323
+ self.latent_attention_model = AutoModel.from_config(config.latent_attention_config)
324
+ self.embedding_model = AutoModel.from_config(
325
+ config.text_config,
326
+ ) if config.text_config is not None else None
327
+ self.tokenizer = AutoTokenizer.from_pretrained(config.text_config._name_or_path) if config.text_config is not None else None
328
+ self.padding_side = config.padding_side
329
+ self.is_mask_instruction = config.is_mask_instruction
330
+ self.add_eos = config.add_eos
331
+ self.mask_type = config.mask_type
332
+ if config.add_pad_token and self.tokenizer is not None:
333
+ self.add_pad_token()
334
+
335
+ def add_pad_token(self):
336
+ self.tokenizer.pad_token = self.tokenizer.eos_token
337
+ self.tokenizer.padding_side = self.padding_side
338
+
339
+ def prepare_kwargs_from_batch(self, batch_dict: dict, instruction_lens: int, device: torch.device):
340
+ batch_dict = move_to_device(batch_dict, device)
341
+ attention_mask = batch_dict['attention_mask'].clone() if 'attention_mask' in batch_dict else None
342
+ if (attention_mask is not None and
343
+ self.padding_side == "right" and
344
+ self.is_mask_instruction == True and
345
+ instruction_lens > 0):
346
+ # Mask out the instruction tokens for mean-pooling
347
+ attention_mask[:, :instruction_lens] = 0
348
+ features: NVEmbedFeatures = {
349
+ 'input_ids': torch.tensor(batch_dict.get('input_ids').to(batch_dict.get('input_ids')).long()),
350
+ 'attention_mask': batch_dict['attention_mask'],
351
+ 'pool_mask': attention_mask,
352
+ }
353
+ return features
354
+
355
+ @torch.no_grad()
356
+ def _do_encode(self,
357
+ prompts: List[str],
358
+ batch_size: int=1,
359
+ instruction: str="",
360
+ max_length: int=4096,
361
+ num_workers: int=32,
362
+ **kwargs
363
+ ) -> Union[np.ndarray, torch.FloatTensor]:
364
+ dataset: Dataset = Dataset.from_dict({'input_texts': prompts})
365
+ dataset.set_transform(partial(input_transform_func,
366
+ self.tokenizer,
367
+ always_add_eos=True,
368
+ max_length=max_length,
369
+ instruction=instruction))
370
+
371
+ data_collator = DataCollatorWithPadding(self.tokenizer)
372
+ data_loader = DataLoader(
373
+ dataset,
374
+ batch_size=batch_size,
375
+ shuffle=False,
376
+ drop_last=False,
377
+ num_workers=num_workers,
378
+ collate_fn=data_collator,
379
+ pin_memory=True)
380
+
381
+ if self.padding_side == "right" and self.is_mask_instruction == True and len(instruction) > 0:
382
+ instruction_lens = len(self.tokenizer.tokenize(instruction))
383
+ else:
384
+ instruction_lens = 0
385
+
386
+ encoded_embeds = []
387
+ device = next(self.embedding_model.parameters()).device
388
+ for batch_dict in tqdm(data_loader, desc='encoding', mininterval=10):
389
+ features = self.prepare_kwargs_from_batch(batch_dict, instruction_lens, device=device)
390
+ embeds=self(**features)["sentence_embeddings"].squeeze(1)
391
+ encoded_embeds.append(embeds)
392
+ encoded_embeds = torch.cat(encoded_embeds, axis=0)
393
+ if "return_numpy" in kwargs and kwargs.get("return_numpy"):
394
+ encoded_embeds = encoded_embeds.cpu().detach().numpy()
395
+ return encoded_embeds
396
+
397
+ def forward(self, input_ids: torch.Tensor, attention_mask: torch.Tensor, pool_mask: Optional[torch.Tensor]=None, return_dict: bool=True):
398
+ autocast_ctx = torch.autocast if torch.cuda.is_available() else nullcontext
399
+ with autocast_ctx("cuda"):
400
+ ## decoder only layer
401
+ outputs = self.embedding_model(
402
+ input_ids=input_ids,
403
+ attention_mask=attention_mask,
404
+ )
405
+ ## latent attention layer
406
+ embeds = self.latent_attention_model(
407
+ outputs.last_hidden_state,
408
+ pool_mask,
409
+ )
410
+ if not return_dict:
411
+ return (embeds,)
412
+ return {"sentence_embeddings": embeds}
413
+
414
+
415
+ @torch.no_grad()
416
+ def encode(self, prompts: List[str], instruction: str="", max_length: int=4096, **kwargs):
417
+ if self.padding_side == "right" and self.is_mask_instruction == True and len(instruction) > 0:
418
+ instruction_lens = len(self.tokenizer.tokenize(instruction))
419
+ else:
420
+ instruction_lens = 0
421
+
422
+ device = next(self.embedding_model.parameters()).device
423
+ batch_dict = input_transform_func(self.tokenizer,
424
+ {"input_texts": [prompt for prompt in prompts]},
425
+ always_add_eos=True,
426
+ max_length=max_length,
427
+ instruction=instruction)
428
+
429
+ features: NVEmbedFeatures = self.prepare_kwargs_from_batch(batch_dict, instruction_lens, device=device)
430
+ return self(**features)["sentence_embeddings"].squeeze(1)
431
+
432
+
433
+ ## AutoModel Register
434
+ AutoModel.register(NVEmbedConfig, NVEmbedModel)
435
+ AutoModel.register(LatentAttentionConfig, LatentAttentionModel)
436
+ AutoModel.register(BidirectionalMistralConfig, BidirectionalMistralModel)
437
+
438
+ ## Register for auto class
439
+ NVEmbedModel.register_for_auto_class("AutoModel")
440
+ LatentAttentionModel.register_for_auto_class("AutoModel")
441
+ BidirectionalMistralModel.register_for_auto_class("AutoModel")
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 1024,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
tokenizer_config.json ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": null,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ }
30
+ },
31
+ "additional_special_tokens": [],
32
+ "bos_token": "<s>",
33
+ "clean_up_tokenization_spaces": false,
34
+ "eos_token": "</s>",
35
+ "legacy": true,
36
+ "max_length": 1024,
37
+ "model_max_length": 1024,
38
+ "pad_to_multiple_of": null,
39
+ "pad_token": "</s>",
40
+ "pad_token_type_id": 0,
41
+ "padding_side": "right",
42
+ "sp_model_kwargs": {},
43
+ "spaces_between_special_tokens": false,
44
+ "stride": 0,
45
+ "tokenizer_class": "LlamaTokenizer",
46
+ "truncation_side": "right",
47
+ "truncation_strategy": "longest_first",
48
+ "unk_token": "<unk>",
49
+ "use_default_system_prompt": false
50
+ }