tomaarsen HF staff commited on
Commit
1e62e43
·
verified ·
1 Parent(s): 5e82f13

Add new CrossEncoder model

Browse files
README.md ADDED
@@ -0,0 +1,431 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - cross-encoder
5
+ - text-classification
6
+ - generated_from_trainer
7
+ - dataset_size:39780704
8
+ - loss:MarginMSELoss
9
+ base_model: microsoft/MiniLM-L12-H384-uncased
10
+ datasets:
11
+ - tomaarsen/ms-marco-shuffled
12
+ pipeline_tag: text-classification
13
+ library_name: sentence-transformers
14
+ metrics:
15
+ - map
16
+ - mrr@10
17
+ - ndcg@10
18
+ model-index:
19
+ - name: CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
20
+ results: []
21
+ ---
22
+
23
+ # CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
24
+
25
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) on the [ms-marco-shuffled](https://huggingface.co/datasets/tomaarsen/ms-marco-shuffled) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
26
+
27
+ ## Model Details
28
+
29
+ ### Model Description
30
+ - **Model Type:** Cross Encoder
31
+ - **Base model:** [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) <!-- at revision 44acabbec0ef496f6dbc93adadea57f376b7c0ec -->
32
+ - **Maximum Sequence Length:** 512 tokens
33
+ - **Number of Output Labels:** 1 label
34
+ - **Training Dataset:**
35
+ - [ms-marco-shuffled](https://huggingface.co/datasets/tomaarsen/ms-marco-shuffled)
36
+ <!-- - **Language:** Unknown -->
37
+ <!-- - **License:** Unknown -->
38
+
39
+ ### Model Sources
40
+
41
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
42
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
43
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
44
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
45
+
46
+ ## Usage
47
+
48
+ ### Direct Usage (Sentence Transformers)
49
+
50
+ First install the Sentence Transformers library:
51
+
52
+ ```bash
53
+ pip install -U sentence-transformers
54
+ ```
55
+
56
+ Then you can load this model and run inference.
57
+ ```python
58
+ from sentence_transformers import CrossEncoder
59
+
60
+ # Download from the 🤗 Hub
61
+ model = CrossEncoder("tomaarsen/reranker-modernbert-base-msmarco-margin-mse")
62
+ # Get scores for pairs of texts
63
+ pairs = [
64
+ ['where is joplin airport', 'Joplin Regional Airport. Joplin Regional Airport (IATA: JLN, ICAO: KJLN, FAA LID: JLN) is a city-owned airport four miles north of Joplin, in Jasper County, Missouri. It has airline service subsidized by the Essential Air Service program. Airline flights and general aviation are in separate terminals.'],
65
+ ['where is the pd on your glasses frame', "Pupillary Distance (PD) You'll need to know your PD if you want to order glasses from EyeBuyDirect. Don't worry if your glasses prescription doesn't include your PD, we can show you how to measure it by yourself. How to measure your pd"],
66
+ ['what year did oldsmobile stop production', 'Oldsmobile was not the problem, it was GM that made oldmobiles but they stopped making them in 2004 and the reason is that Oldsmobiles did not bring in enough money for GM or â\x80¦ (General Motors) to be happy so they stopped. but if you ask me i think any car that lasted 106 year is good enough and is a good car to keep selling.'],
67
+ ['how many sisters did barbie have', "1 Kelly/Chelsea Roberts (1995-2009â\x80\x93present) This character is of toddler age, and is a sister to Barbie, Skipper, and Stacie. 2 Originally the baby of the family (replaced by her younger sister Krissy Roberts in 1999), she also has three older sisters: Barbie, Skipper, and Stacie. Skipper is Barbie's younger sister. 2 She was first introduced with blue eyes and a variety of hair colors like blonde and brown. 3 She is a main character in the Barbie: Life in the Dreamhouse series. 4 In the series, she has been remodeled as a teenager with brown hair and a purple streak."],
68
+ ['who discovered achondroplasia dwarfism', "For several years, Dr. Wasmuth and his team had suspected that the gene, FGFR3, was responsible for a defect that causes Huntington's disease, a neurological disorder. But they found no link. They took another look after other researchers suggested that the same chromosome region might harbor the achondroplasia gene."],
69
+ ]
70
+ scores = model.predict(pairs)
71
+ print(scores.shape)
72
+ # (5,)
73
+
74
+ # Or rank different texts based on similarity to a single text
75
+ ranks = model.rank(
76
+ 'where is joplin airport',
77
+ [
78
+ 'Joplin Regional Airport. Joplin Regional Airport (IATA: JLN, ICAO: KJLN, FAA LID: JLN) is a city-owned airport four miles north of Joplin, in Jasper County, Missouri. It has airline service subsidized by the Essential Air Service program. Airline flights and general aviation are in separate terminals.',
79
+ "Pupillary Distance (PD) You'll need to know your PD if you want to order glasses from EyeBuyDirect. Don't worry if your glasses prescription doesn't include your PD, we can show you how to measure it by yourself. How to measure your pd",
80
+ 'Oldsmobile was not the problem, it was GM that made oldmobiles but they stopped making them in 2004 and the reason is that Oldsmobiles did not bring in enough money for GM or â\x80¦ (General Motors) to be happy so they stopped. but if you ask me i think any car that lasted 106 year is good enough and is a good car to keep selling.',
81
+ "1 Kelly/Chelsea Roberts (1995-2009â\x80\x93present) This character is of toddler age, and is a sister to Barbie, Skipper, and Stacie. 2 Originally the baby of the family (replaced by her younger sister Krissy Roberts in 1999), she also has three older sisters: Barbie, Skipper, and Stacie. Skipper is Barbie's younger sister. 2 She was first introduced with blue eyes and a variety of hair colors like blonde and brown. 3 She is a main character in the Barbie: Life in the Dreamhouse series. 4 In the series, she has been remodeled as a teenager with brown hair and a purple streak.",
82
+ "For several years, Dr. Wasmuth and his team had suspected that the gene, FGFR3, was responsible for a defect that causes Huntington's disease, a neurological disorder. But they found no link. They took another look after other researchers suggested that the same chromosome region might harbor the achondroplasia gene.",
83
+ ]
84
+ )
85
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
86
+ ```
87
+
88
+ <!--
89
+ ### Direct Usage (Transformers)
90
+
91
+ <details><summary>Click to see the direct usage in Transformers</summary>
92
+
93
+ </details>
94
+ -->
95
+
96
+ <!--
97
+ ### Downstream Usage (Sentence Transformers)
98
+
99
+ You can finetune this model on your own dataset.
100
+
101
+ <details><summary>Click to expand</summary>
102
+
103
+ </details>
104
+ -->
105
+
106
+ <!--
107
+ ### Out-of-Scope Use
108
+
109
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
110
+ -->
111
+
112
+ ## Evaluation
113
+
114
+ ### Metrics
115
+
116
+ #### Cross Encoder Reranking
117
+
118
+ * Datasets: `NanoMSMARCO`, `NanoNFCorpus` and `NanoNQ`
119
+ * Evaluated with [<code>CERerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CERerankingEvaluator)
120
+
121
+ | Metric | NanoMSMARCO | NanoNFCorpus | NanoNQ |
122
+ |:------------|:---------------------|:---------------------|:---------------------|
123
+ | map | 0.6114 (+0.1219) | 0.3561 (+0.0857) | 0.6775 (+0.2568) |
124
+ | mrr@10 | 0.6022 (+0.1247) | 0.5900 (+0.0902) | 0.6893 (+0.2626) |
125
+ | **ndcg@10** | **0.6673 (+0.1269)** | **0.4034 (+0.0783)** | **0.7330 (+0.2324)** |
126
+
127
+ #### Cross Encoder Nano BEIR
128
+
129
+ * Dataset: `NanoBEIR_mean`
130
+ * Evaluated with [<code>CENanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CENanoBEIREvaluator)
131
+
132
+ | Metric | Value |
133
+ |:------------|:---------------------|
134
+ | map | 0.5484 (+0.1548) |
135
+ | mrr@10 | 0.6272 (+0.1592) |
136
+ | **ndcg@10** | **0.6012 (+0.1459)** |
137
+
138
+ <!--
139
+ ## Bias, Risks and Limitations
140
+
141
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
142
+ -->
143
+
144
+ <!--
145
+ ### Recommendations
146
+
147
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
148
+ -->
149
+
150
+ ## Training Details
151
+
152
+ ### Training Dataset
153
+
154
+ #### ms-marco-shuffled
155
+
156
+ * Dataset: [ms-marco-shuffled](https://huggingface.co/datasets/tomaarsen/ms-marco-shuffled) at [0e80192](https://huggingface.co/datasets/tomaarsen/ms-marco-shuffled/tree/0e8019214fbbb17845d8fa1e4594882944716633)
157
+ * Size: 39,780,704 training samples
158
+ * Columns: <code>score</code>, <code>query</code>, <code>positive</code>, and <code>negative</code>
159
+ * Approximate statistics based on the first 1000 samples:
160
+ | | score | query | positive | negative |
161
+ |:--------|:--------------------------------------------------------------------|:------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|
162
+ | type | float | string | string | string |
163
+ | details | <ul><li>min: -4.89</li><li>mean: 13.57</li><li>max: 22.32</li></ul> | <ul><li>min: 12 characters</li><li>mean: 33.75 characters</li><li>max: 141 characters</li></ul> | <ul><li>min: 71 characters</li><li>mean: 349.99 characters</li><li>max: 1000 characters</li></ul> | <ul><li>min: 82 characters</li><li>mean: 337.52 characters</li><li>max: 928 characters</li></ul> |
164
+ * Samples:
165
+ | score | query | positive | negative |
166
+ |:--------------------------------|:----------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
167
+ | <code>6.012716511885325</code> | <code>what body part does gases, such as oxygen and carbon dioxide, pass into or out of the blood?</code> | <code>As blood passes through your lungs, oxygen moves into the blood while carbon dioxide moves out of the blood into the lungs. An ABG test uses blood drawn from an artery, where the oxygen and carbon dioxide levels can be measured before they enter body tissues. An ABG measures: 1 Partial pressure of oxygen (PaO2).</code> | <code>Answers. Best Answer: The respiratory system takes in oxygen from the atmosphere and moves that oxygen into the bloodstream. The circulatory system then carries the oxygen to all the cells in the body and picks up carbon dioxide waste which it returns to the lungs.Carbon dioxide diffuses from the blood into the lungs and it is then exhaled into the atmosphere.he circulatory system then carries the oxygen to all the cells in the body and picks up carbon dioxide waste which it returns to the lungs.</code> |
168
+ | <code>5.666825115680695</code> | <code>what does iron deficiency do</code> | <code>Iron-deficiency anemia is the most common type of anemia. It happens when you do not have enough iron in your body. Iron deficiency is usually due to blood loss but may occasionally be due to poor absorption of iron. Pregnancy and childbirth consume a great deal of iron and thus can result in pregnancy-related anemia.</code> | <code>color vision deficiency see color vision deficiency. deficiency disease a condition due to dietary or metabolic deficiency, including all diseases caused by an insufficient supply of essential nutrients.iron deficiency deficiency of iron in the system, as from blood loss, low dietary iron, or a disease condition that inhibits iron uptake.See iron and iron deficiency anemia.olor vision deficiency see color vision deficiency. deficiency disease a condition due to dietary or metabolic deficiency, including all diseases caused by an insufficient supply of essential nutrients.</code> |
169
+ | <code>14.512734095255535</code> | <code>cost of tavrmasoposed to open heart surgery</code> | <code>Several factors come into play when you’re trying to figure out how much you’re going to have to pay for an open heart surgery. The two biggest factors are what kind of open heart surgery you're having how good your insurance is. A heart transplant runs more than $700,000, significantly more than most annual salaries. Other open heart surgeries are in the neighborhood of $325,000. Much of the expense is not only the four hour long surgery, but also the testing, the anesthesia, and the medication and aftercare that are all part of the package.</code> | <code>Foods You Can Eat After Heart Bypass. Healthy foods provide multiple benefits following heart bypass surgery. Heart bypass surgery, also called coronary bypass surgery, is performed to restore blood flow to your heart when a section of an artery in your heart is blocked.</code> |
170
+ * Loss: [<code>MarginMSELoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#marginmseloss) with these parameters:
171
+ ```json
172
+ {
173
+ "activation_fct": "torch.nn.modules.linear.Identity"
174
+ }
175
+ ```
176
+
177
+ ### Evaluation Dataset
178
+
179
+ #### ms-marco-shuffled
180
+
181
+ * Dataset: [ms-marco-shuffled](https://huggingface.co/datasets/tomaarsen/ms-marco-shuffled) at [0e80192](https://huggingface.co/datasets/tomaarsen/ms-marco-shuffled/tree/0e8019214fbbb17845d8fa1e4594882944716633)
182
+ * Size: 39,780,704 evaluation samples
183
+ * Columns: <code>score</code>, <code>query</code>, <code>positive</code>, and <code>negative</code>
184
+ * Approximate statistics based on the first 1000 samples:
185
+ | | score | query | positive | negative |
186
+ |:--------|:--------------------------------------------------------------------|:------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|
187
+ | type | float | string | string | string |
188
+ | details | <ul><li>min: -1.57</li><li>mean: 13.57</li><li>max: 22.36</li></ul> | <ul><li>min: 10 characters</li><li>mean: 34.47 characters</li><li>max: 109 characters</li></ul> | <ul><li>min: 64 characters</li><li>mean: 345.45 characters</li><li>max: 963 characters</li></ul> | <ul><li>min: 56 characters</li><li>mean: 341.89 characters</li><li>max: 947 characters</li></ul> |
189
+ * Samples:
190
+ | score | query | positive | negative |
191
+ |:--------------------------------|:------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
192
+ | <code>16.928720156351726</code> | <code>where is joplin airport</code> | <code>Joplin Regional Airport. Joplin Regional Airport (IATA: JLN, ICAO: KJLN, FAA LID: JLN) is a city-owned airport four miles north of Joplin, in Jasper County, Missouri. It has airline service subsidized by the Essential Air Service program. Airline flights and general aviation are in separate terminals.</code> | <code>Hoskins Airport. If you’re flying from or into Hoskins airport or simply collecting someone from their flight to Hoskins, discover all the latest information you need from Hoskins airport. Find directions, airport information and local weather for Hoskins airport and details of airlines that fly to and from Hoskins.</code> |
193
+ | <code>15.824924786885578</code> | <code>where is the pd on your glasses frame</code> | <code>Pupillary Distance (PD) You'll need to know your PD if you want to order glasses from EyeBuyDirect. Don't worry if your glasses prescription doesn't include your PD, we can show you how to measure it by yourself. How to measure your pd</code> | <code>exists and is an alternate of . Mahwah PD in NJ makes 121k after 6 years, Bergenfield PD makes 117k after 5 years and there are endless PD'S that smash the base pay of SCPD. Mahwah PD in NJ makes 121k after 6 years, Bergenfield PD makes 117k after 5 years and there are endless PD'S that smash the base pay of SCPD.</code> |
194
+ | <code>18.074473301569622</code> | <code>what year did oldsmobile stop production</code> | <code>Oldsmobile was not the problem, it was GM that made oldmobiles but they stopped making them in 2004 and the reason is that Oldsmobiles did not bring in enough money for GM or … (General Motors) to be happy so they stopped. but if you ask me i think any car that lasted 106 year is good enough and is a good car to keep selling.</code> | <code>Cinsaut vines. Known as Ottavianello, there is one tiny DOC devoted to Cinsaut-Ostuni Ottavianello, with a total production of less than 1000 cases a year.However, Cinsaut has long been used in Apulian blends and has also begun to attract the attention of winemakers interested in reviving old varieties.insaut vines. Known as Ottavianello, there is one tiny DOC devoted to Cinsaut-Ostuni Ottavianello, with a total production of less than 1000 cases a year.</code> |
195
+ * Loss: [<code>MarginMSELoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#marginmseloss) with these parameters:
196
+ ```json
197
+ {
198
+ "activation_fct": "torch.nn.modules.linear.Identity"
199
+ }
200
+ ```
201
+
202
+ ### Training Hyperparameters
203
+ #### Non-Default Hyperparameters
204
+
205
+ - `eval_strategy`: steps
206
+ - `per_device_train_batch_size`: 64
207
+ - `per_device_eval_batch_size`: 64
208
+ - `learning_rate`: 8e-06
209
+ - `num_train_epochs`: 1
210
+ - `warmup_ratio`: 0.1
211
+ - `seed`: 12
212
+ - `bf16`: True
213
+ - `dataloader_num_workers`: 4
214
+ - `load_best_model_at_end`: True
215
+
216
+ #### All Hyperparameters
217
+ <details><summary>Click to expand</summary>
218
+
219
+ - `overwrite_output_dir`: False
220
+ - `do_predict`: False
221
+ - `eval_strategy`: steps
222
+ - `prediction_loss_only`: True
223
+ - `per_device_train_batch_size`: 64
224
+ - `per_device_eval_batch_size`: 64
225
+ - `per_gpu_train_batch_size`: None
226
+ - `per_gpu_eval_batch_size`: None
227
+ - `gradient_accumulation_steps`: 1
228
+ - `eval_accumulation_steps`: None
229
+ - `torch_empty_cache_steps`: None
230
+ - `learning_rate`: 8e-06
231
+ - `weight_decay`: 0.0
232
+ - `adam_beta1`: 0.9
233
+ - `adam_beta2`: 0.999
234
+ - `adam_epsilon`: 1e-08
235
+ - `max_grad_norm`: 1.0
236
+ - `num_train_epochs`: 1
237
+ - `max_steps`: -1
238
+ - `lr_scheduler_type`: linear
239
+ - `lr_scheduler_kwargs`: {}
240
+ - `warmup_ratio`: 0.1
241
+ - `warmup_steps`: 0
242
+ - `log_level`: passive
243
+ - `log_level_replica`: warning
244
+ - `log_on_each_node`: True
245
+ - `logging_nan_inf_filter`: True
246
+ - `save_safetensors`: True
247
+ - `save_on_each_node`: False
248
+ - `save_only_model`: False
249
+ - `restore_callback_states_from_checkpoint`: False
250
+ - `no_cuda`: False
251
+ - `use_cpu`: False
252
+ - `use_mps_device`: False
253
+ - `seed`: 12
254
+ - `data_seed`: None
255
+ - `jit_mode_eval`: False
256
+ - `use_ipex`: False
257
+ - `bf16`: True
258
+ - `fp16`: False
259
+ - `fp16_opt_level`: O1
260
+ - `half_precision_backend`: auto
261
+ - `bf16_full_eval`: False
262
+ - `fp16_full_eval`: False
263
+ - `tf32`: None
264
+ - `local_rank`: 0
265
+ - `ddp_backend`: None
266
+ - `tpu_num_cores`: None
267
+ - `tpu_metrics_debug`: False
268
+ - `debug`: []
269
+ - `dataloader_drop_last`: False
270
+ - `dataloader_num_workers`: 4
271
+ - `dataloader_prefetch_factor`: None
272
+ - `past_index`: -1
273
+ - `disable_tqdm`: False
274
+ - `remove_unused_columns`: True
275
+ - `label_names`: None
276
+ - `load_best_model_at_end`: True
277
+ - `ignore_data_skip`: False
278
+ - `fsdp`: []
279
+ - `fsdp_min_num_params`: 0
280
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
281
+ - `fsdp_transformer_layer_cls_to_wrap`: None
282
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
283
+ - `deepspeed`: None
284
+ - `label_smoothing_factor`: 0.0
285
+ - `optim`: adamw_torch
286
+ - `optim_args`: None
287
+ - `adafactor`: False
288
+ - `group_by_length`: False
289
+ - `length_column_name`: length
290
+ - `ddp_find_unused_parameters`: None
291
+ - `ddp_bucket_cap_mb`: None
292
+ - `ddp_broadcast_buffers`: False
293
+ - `dataloader_pin_memory`: True
294
+ - `dataloader_persistent_workers`: False
295
+ - `skip_memory_metrics`: True
296
+ - `use_legacy_prediction_loop`: False
297
+ - `push_to_hub`: False
298
+ - `resume_from_checkpoint`: None
299
+ - `hub_model_id`: None
300
+ - `hub_strategy`: every_save
301
+ - `hub_private_repo`: None
302
+ - `hub_always_push`: False
303
+ - `gradient_checkpointing`: False
304
+ - `gradient_checkpointing_kwargs`: None
305
+ - `include_inputs_for_metrics`: False
306
+ - `include_for_metrics`: []
307
+ - `eval_do_concat_batches`: True
308
+ - `fp16_backend`: auto
309
+ - `push_to_hub_model_id`: None
310
+ - `push_to_hub_organization`: None
311
+ - `mp_parameters`:
312
+ - `auto_find_batch_size`: False
313
+ - `full_determinism`: False
314
+ - `torchdynamo`: None
315
+ - `ray_scope`: last
316
+ - `ddp_timeout`: 1800
317
+ - `torch_compile`: False
318
+ - `torch_compile_backend`: None
319
+ - `torch_compile_mode`: None
320
+ - `dispatch_batches`: None
321
+ - `split_batches`: None
322
+ - `include_tokens_per_second`: False
323
+ - `include_num_input_tokens_seen`: False
324
+ - `neftune_noise_alpha`: None
325
+ - `optim_target_modules`: None
326
+ - `batch_eval_metrics`: False
327
+ - `eval_on_start`: False
328
+ - `use_liger_kernel`: False
329
+ - `eval_use_gather_object`: False
330
+ - `average_tokens_across_devices`: False
331
+ - `prompts`: None
332
+ - `batch_sampler`: batch_sampler
333
+ - `multi_dataset_batch_sampler`: proportional
334
+
335
+ </details>
336
+
337
+ ### Training Logs
338
+ | Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_ndcg@10 | NanoNFCorpus_ndcg@10 | NanoNQ_ndcg@10 | NanoBEIR_mean_ndcg@10 |
339
+ |:----------:|:---------:|:-------------:|:---------------:|:--------------------:|:--------------------:|:--------------------:|:---------------------:|
340
+ | -1 | -1 | - | - | 0.0255 (-0.5150) | 0.3351 (+0.0101) | 0.0539 (-0.4467) | 0.1382 (-0.3172) |
341
+ | 0.0000 | 1 | 197.7525 | - | - | - | - | - |
342
+ | 0.0322 | 1000 | 189.9111 | - | - | - | - | - |
343
+ | 0.0643 | 2000 | 100.2999 | - | - | - | - | - |
344
+ | 0.0965 | 3000 | 33.4914 | - | - | - | - | - |
345
+ | 0.1286 | 4000 | 10.2638 | - | - | - | - | - |
346
+ | 0.1608 | 5000 | 7.333 | 6.1981 | 0.6326 (+0.0922) | 0.4145 (+0.0894) | 0.6989 (+0.1983) | 0.5820 (+0.1266) |
347
+ | 0.1930 | 6000 | 6.2212 | - | - | - | - | - |
348
+ | 0.2251 | 7000 | 5.6437 | - | - | - | - | - |
349
+ | 0.2573 | 8000 | 5.3485 | - | - | - | - | - |
350
+ | 0.2894 | 9000 | 5.0373 | - | - | - | - | - |
351
+ | 0.3216 | 10000 | 4.7753 | 4.3763 | 0.6565 (+0.1161) | 0.4161 (+0.0910) | 0.7294 (+0.2288) | 0.6007 (+0.1453) |
352
+ | 0.3538 | 11000 | 4.5805 | - | - | - | - | - |
353
+ | 0.3859 | 12000 | 4.4494 | - | - | - | - | - |
354
+ | 0.4181 | 13000 | 4.3038 | - | - | - | - | - |
355
+ | 0.4502 | 14000 | 4.2497 | - | - | - | - | - |
356
+ | **0.4824** | **15000** | **4.116** | **4.0312** | **0.6673 (+0.1269)** | **0.4034 (+0.0783)** | **0.7330 (+0.2324)** | **0.6012 (+0.1459)** |
357
+ | 0.5146 | 16000 | 4.0779 | - | - | - | - | - |
358
+ | 0.5467 | 17000 | 4.0045 | - | - | - | - | - |
359
+ | 0.5789 | 18000 | 3.8951 | - | - | - | - | - |
360
+ | 0.6111 | 19000 | 3.8733 | - | - | - | - | - |
361
+ | 0.6432 | 20000 | 3.7693 | 3.7577 | 0.6624 (+0.1220) | 0.4052 (+0.0802) | 0.7282 (+0.2276) | 0.5986 (+0.1432) |
362
+ | 0.6754 | 21000 | 3.794 | - | - | - | - | - |
363
+ | 0.7075 | 22000 | 3.6753 | - | - | - | - | - |
364
+ | 0.7397 | 23000 | 3.6859 | - | - | - | - | - |
365
+ | 0.7719 | 24000 | 3.6511 | - | - | - | - | - |
366
+ | 0.8040 | 25000 | 3.6294 | 3.6983 | 0.6507 (+0.1103) | 0.4054 (+0.0804) | 0.7291 (+0.2284) | 0.5951 (+0.1397) |
367
+ | 0.8362 | 26000 | 3.6437 | - | - | - | - | - |
368
+ | 0.8683 | 27000 | 3.549 | - | - | - | - | - |
369
+ | 0.9005 | 28000 | 3.529 | - | - | - | - | - |
370
+ | 0.9327 | 29000 | 3.535 | - | - | - | - | - |
371
+ | 0.9648 | 30000 | 3.5088 | 3.6602 | 0.6574 (+0.1170) | 0.4052 (+0.0801) | 0.7230 (+0.2223) | 0.5952 (+0.1398) |
372
+ | 0.9970 | 31000 | 3.472 | - | - | - | - | - |
373
+ | -1 | -1 | - | - | 0.6673 (+0.1269) | 0.4034 (+0.0783) | 0.7330 (+0.2324) | 0.6012 (+0.1459) |
374
+
375
+ * The bold row denotes the saved checkpoint.
376
+
377
+ ### Framework Versions
378
+ - Python: 3.11.10
379
+ - Sentence Transformers: 3.5.0.dev0
380
+ - Transformers: 4.49.0.dev0
381
+ - PyTorch: 2.6.0.dev20241112+cu121
382
+ - Accelerate: 1.2.0
383
+ - Datasets: 3.2.0
384
+ - Tokenizers: 0.21.0
385
+
386
+ ## Citation
387
+
388
+ ### BibTeX
389
+
390
+ #### Sentence Transformers
391
+ ```bibtex
392
+ @inproceedings{reimers-2019-sentence-bert,
393
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
394
+ author = "Reimers, Nils and Gurevych, Iryna",
395
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
396
+ month = "11",
397
+ year = "2019",
398
+ publisher = "Association for Computational Linguistics",
399
+ url = "https://arxiv.org/abs/1908.10084",
400
+ }
401
+ ```
402
+
403
+ #### MarginMSELoss
404
+ ```bibtex
405
+ @misc{hofstätter2021improving,
406
+ title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation},
407
+ author={Sebastian Hofstätter and Sophia Althammer and Michael Schröder and Mete Sertkan and Allan Hanbury},
408
+ year={2021},
409
+ eprint={2010.02666},
410
+ archivePrefix={arXiv},
411
+ primaryClass={cs.IR}
412
+ }
413
+ ```
414
+
415
+ <!--
416
+ ## Glossary
417
+
418
+ *Clearly define terms in order to be accessible across audiences.*
419
+ -->
420
+
421
+ <!--
422
+ ## Model Card Authors
423
+
424
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
425
+ -->
426
+
427
+ <!--
428
+ ## Model Card Contact
429
+
430
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
431
+ -->
config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/MiniLM-L12-H384-uncased",
3
+ "architectures": [
4
+ "BertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 1536,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "torch_dtype": "float32",
27
+ "transformers_version": "4.49.0.dev0",
28
+ "type_vocab_size": 2,
29
+ "use_cache": true,
30
+ "vocab_size": 30522
31
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:04f5bbbaf47a34ed2db404b4111d7cd891c68dfe0c32e2868db96eae3c2723b8
3
+ size 133464836
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff