Add new CrossEncoder model
Browse files- README.md +431 -0
- config.json +31 -0
- model.safetensors +3 -0
- special_tokens_map.json +7 -0
- tokenizer.json +0 -0
- tokenizer_config.json +58 -0
- vocab.txt +0 -0
README.md
ADDED
@@ -0,0 +1,431 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- sentence-transformers
|
4 |
+
- cross-encoder
|
5 |
+
- text-classification
|
6 |
+
- generated_from_trainer
|
7 |
+
- dataset_size:39780704
|
8 |
+
- loss:MarginMSELoss
|
9 |
+
base_model: microsoft/MiniLM-L12-H384-uncased
|
10 |
+
datasets:
|
11 |
+
- tomaarsen/ms-marco-shuffled
|
12 |
+
pipeline_tag: text-classification
|
13 |
+
library_name: sentence-transformers
|
14 |
+
metrics:
|
15 |
+
- map
|
16 |
+
- mrr@10
|
17 |
+
- ndcg@10
|
18 |
+
model-index:
|
19 |
+
- name: CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
|
20 |
+
results: []
|
21 |
+
---
|
22 |
+
|
23 |
+
# CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
|
24 |
+
|
25 |
+
This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) on the [ms-marco-shuffled](https://huggingface.co/datasets/tomaarsen/ms-marco-shuffled) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
26 |
+
|
27 |
+
## Model Details
|
28 |
+
|
29 |
+
### Model Description
|
30 |
+
- **Model Type:** Cross Encoder
|
31 |
+
- **Base model:** [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) <!-- at revision 44acabbec0ef496f6dbc93adadea57f376b7c0ec -->
|
32 |
+
- **Maximum Sequence Length:** 512 tokens
|
33 |
+
- **Number of Output Labels:** 1 label
|
34 |
+
- **Training Dataset:**
|
35 |
+
- [ms-marco-shuffled](https://huggingface.co/datasets/tomaarsen/ms-marco-shuffled)
|
36 |
+
<!-- - **Language:** Unknown -->
|
37 |
+
<!-- - **License:** Unknown -->
|
38 |
+
|
39 |
+
### Model Sources
|
40 |
+
|
41 |
+
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
|
42 |
+
- **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
|
43 |
+
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
|
44 |
+
- **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
|
45 |
+
|
46 |
+
## Usage
|
47 |
+
|
48 |
+
### Direct Usage (Sentence Transformers)
|
49 |
+
|
50 |
+
First install the Sentence Transformers library:
|
51 |
+
|
52 |
+
```bash
|
53 |
+
pip install -U sentence-transformers
|
54 |
+
```
|
55 |
+
|
56 |
+
Then you can load this model and run inference.
|
57 |
+
```python
|
58 |
+
from sentence_transformers import CrossEncoder
|
59 |
+
|
60 |
+
# Download from the 🤗 Hub
|
61 |
+
model = CrossEncoder("tomaarsen/reranker-modernbert-base-msmarco-margin-mse")
|
62 |
+
# Get scores for pairs of texts
|
63 |
+
pairs = [
|
64 |
+
['where is joplin airport', 'Joplin Regional Airport. Joplin Regional Airport (IATA: JLN, ICAO: KJLN, FAA LID: JLN) is a city-owned airport four miles north of Joplin, in Jasper County, Missouri. It has airline service subsidized by the Essential Air Service program. Airline flights and general aviation are in separate terminals.'],
|
65 |
+
['where is the pd on your glasses frame', "Pupillary Distance (PD) You'll need to know your PD if you want to order glasses from EyeBuyDirect. Don't worry if your glasses prescription doesn't include your PD, we can show you how to measure it by yourself. How to measure your pd"],
|
66 |
+
['what year did oldsmobile stop production', 'Oldsmobile was not the problem, it was GM that made oldmobiles but they stopped making them in 2004 and the reason is that Oldsmobiles did not bring in enough money for GM or â\x80¦ (General Motors) to be happy so they stopped. but if you ask me i think any car that lasted 106 year is good enough and is a good car to keep selling.'],
|
67 |
+
['how many sisters did barbie have', "1 Kelly/Chelsea Roberts (1995-2009â\x80\x93present) This character is of toddler age, and is a sister to Barbie, Skipper, and Stacie. 2 Originally the baby of the family (replaced by her younger sister Krissy Roberts in 1999), she also has three older sisters: Barbie, Skipper, and Stacie. Skipper is Barbie's younger sister. 2 She was first introduced with blue eyes and a variety of hair colors like blonde and brown. 3 She is a main character in the Barbie: Life in the Dreamhouse series. 4 In the series, she has been remodeled as a teenager with brown hair and a purple streak."],
|
68 |
+
['who discovered achondroplasia dwarfism', "For several years, Dr. Wasmuth and his team had suspected that the gene, FGFR3, was responsible for a defect that causes Huntington's disease, a neurological disorder. But they found no link. They took another look after other researchers suggested that the same chromosome region might harbor the achondroplasia gene."],
|
69 |
+
]
|
70 |
+
scores = model.predict(pairs)
|
71 |
+
print(scores.shape)
|
72 |
+
# (5,)
|
73 |
+
|
74 |
+
# Or rank different texts based on similarity to a single text
|
75 |
+
ranks = model.rank(
|
76 |
+
'where is joplin airport',
|
77 |
+
[
|
78 |
+
'Joplin Regional Airport. Joplin Regional Airport (IATA: JLN, ICAO: KJLN, FAA LID: JLN) is a city-owned airport four miles north of Joplin, in Jasper County, Missouri. It has airline service subsidized by the Essential Air Service program. Airline flights and general aviation are in separate terminals.',
|
79 |
+
"Pupillary Distance (PD) You'll need to know your PD if you want to order glasses from EyeBuyDirect. Don't worry if your glasses prescription doesn't include your PD, we can show you how to measure it by yourself. How to measure your pd",
|
80 |
+
'Oldsmobile was not the problem, it was GM that made oldmobiles but they stopped making them in 2004 and the reason is that Oldsmobiles did not bring in enough money for GM or â\x80¦ (General Motors) to be happy so they stopped. but if you ask me i think any car that lasted 106 year is good enough and is a good car to keep selling.',
|
81 |
+
"1 Kelly/Chelsea Roberts (1995-2009â\x80\x93present) This character is of toddler age, and is a sister to Barbie, Skipper, and Stacie. 2 Originally the baby of the family (replaced by her younger sister Krissy Roberts in 1999), she also has three older sisters: Barbie, Skipper, and Stacie. Skipper is Barbie's younger sister. 2 She was first introduced with blue eyes and a variety of hair colors like blonde and brown. 3 She is a main character in the Barbie: Life in the Dreamhouse series. 4 In the series, she has been remodeled as a teenager with brown hair and a purple streak.",
|
82 |
+
"For several years, Dr. Wasmuth and his team had suspected that the gene, FGFR3, was responsible for a defect that causes Huntington's disease, a neurological disorder. But they found no link. They took another look after other researchers suggested that the same chromosome region might harbor the achondroplasia gene.",
|
83 |
+
]
|
84 |
+
)
|
85 |
+
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
|
86 |
+
```
|
87 |
+
|
88 |
+
<!--
|
89 |
+
### Direct Usage (Transformers)
|
90 |
+
|
91 |
+
<details><summary>Click to see the direct usage in Transformers</summary>
|
92 |
+
|
93 |
+
</details>
|
94 |
+
-->
|
95 |
+
|
96 |
+
<!--
|
97 |
+
### Downstream Usage (Sentence Transformers)
|
98 |
+
|
99 |
+
You can finetune this model on your own dataset.
|
100 |
+
|
101 |
+
<details><summary>Click to expand</summary>
|
102 |
+
|
103 |
+
</details>
|
104 |
+
-->
|
105 |
+
|
106 |
+
<!--
|
107 |
+
### Out-of-Scope Use
|
108 |
+
|
109 |
+
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
|
110 |
+
-->
|
111 |
+
|
112 |
+
## Evaluation
|
113 |
+
|
114 |
+
### Metrics
|
115 |
+
|
116 |
+
#### Cross Encoder Reranking
|
117 |
+
|
118 |
+
* Datasets: `NanoMSMARCO`, `NanoNFCorpus` and `NanoNQ`
|
119 |
+
* Evaluated with [<code>CERerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CERerankingEvaluator)
|
120 |
+
|
121 |
+
| Metric | NanoMSMARCO | NanoNFCorpus | NanoNQ |
|
122 |
+
|:------------|:---------------------|:---------------------|:---------------------|
|
123 |
+
| map | 0.6114 (+0.1219) | 0.3561 (+0.0857) | 0.6775 (+0.2568) |
|
124 |
+
| mrr@10 | 0.6022 (+0.1247) | 0.5900 (+0.0902) | 0.6893 (+0.2626) |
|
125 |
+
| **ndcg@10** | **0.6673 (+0.1269)** | **0.4034 (+0.0783)** | **0.7330 (+0.2324)** |
|
126 |
+
|
127 |
+
#### Cross Encoder Nano BEIR
|
128 |
+
|
129 |
+
* Dataset: `NanoBEIR_mean`
|
130 |
+
* Evaluated with [<code>CENanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CENanoBEIREvaluator)
|
131 |
+
|
132 |
+
| Metric | Value |
|
133 |
+
|:------------|:---------------------|
|
134 |
+
| map | 0.5484 (+0.1548) |
|
135 |
+
| mrr@10 | 0.6272 (+0.1592) |
|
136 |
+
| **ndcg@10** | **0.6012 (+0.1459)** |
|
137 |
+
|
138 |
+
<!--
|
139 |
+
## Bias, Risks and Limitations
|
140 |
+
|
141 |
+
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
|
142 |
+
-->
|
143 |
+
|
144 |
+
<!--
|
145 |
+
### Recommendations
|
146 |
+
|
147 |
+
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
|
148 |
+
-->
|
149 |
+
|
150 |
+
## Training Details
|
151 |
+
|
152 |
+
### Training Dataset
|
153 |
+
|
154 |
+
#### ms-marco-shuffled
|
155 |
+
|
156 |
+
* Dataset: [ms-marco-shuffled](https://huggingface.co/datasets/tomaarsen/ms-marco-shuffled) at [0e80192](https://huggingface.co/datasets/tomaarsen/ms-marco-shuffled/tree/0e8019214fbbb17845d8fa1e4594882944716633)
|
157 |
+
* Size: 39,780,704 training samples
|
158 |
+
* Columns: <code>score</code>, <code>query</code>, <code>positive</code>, and <code>negative</code>
|
159 |
+
* Approximate statistics based on the first 1000 samples:
|
160 |
+
| | score | query | positive | negative |
|
161 |
+
|:--------|:--------------------------------------------------------------------|:------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|
|
162 |
+
| type | float | string | string | string |
|
163 |
+
| details | <ul><li>min: -4.89</li><li>mean: 13.57</li><li>max: 22.32</li></ul> | <ul><li>min: 12 characters</li><li>mean: 33.75 characters</li><li>max: 141 characters</li></ul> | <ul><li>min: 71 characters</li><li>mean: 349.99 characters</li><li>max: 1000 characters</li></ul> | <ul><li>min: 82 characters</li><li>mean: 337.52 characters</li><li>max: 928 characters</li></ul> |
|
164 |
+
* Samples:
|
165 |
+
| score | query | positive | negative |
|
166 |
+
|:--------------------------------|:----------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
167 |
+
| <code>6.012716511885325</code> | <code>what body part does gases, such as oxygen and carbon dioxide, pass into or out of the blood?</code> | <code>As blood passes through your lungs, oxygen moves into the blood while carbon dioxide moves out of the blood into the lungs. An ABG test uses blood drawn from an artery, where the oxygen and carbon dioxide levels can be measured before they enter body tissues. An ABG measures: 1 Partial pressure of oxygen (PaO2).</code> | <code>Answers. Best Answer: The respiratory system takes in oxygen from the atmosphere and moves that oxygen into the bloodstream. The circulatory system then carries the oxygen to all the cells in the body and picks up carbon dioxide waste which it returns to the lungs.Carbon dioxide diffuses from the blood into the lungs and it is then exhaled into the atmosphere.he circulatory system then carries the oxygen to all the cells in the body and picks up carbon dioxide waste which it returns to the lungs.</code> |
|
168 |
+
| <code>5.666825115680695</code> | <code>what does iron deficiency do</code> | <code>Iron-deficiency anemia is the most common type of anemia. It happens when you do not have enough iron in your body. Iron deficiency is usually due to blood loss but may occasionally be due to poor absorption of iron. Pregnancy and childbirth consume a great deal of iron and thus can result in pregnancy-related anemia.</code> | <code>color vision deficiency see color vision deficiency. deficiency disease a condition due to dietary or metabolic deficiency, including all diseases caused by an insufficient supply of essential nutrients.iron deficiency deficiency of iron in the system, as from blood loss, low dietary iron, or a disease condition that inhibits iron uptake.See iron and iron deficiency anemia.olor vision deficiency see color vision deficiency. deficiency disease a condition due to dietary or metabolic deficiency, including all diseases caused by an insufficient supply of essential nutrients.</code> |
|
169 |
+
| <code>14.512734095255535</code> | <code>cost of tavrmasoposed to open heart surgery</code> | <code>Several factors come into play when youâre trying to figure out how much youâre going to have to pay for an open heart surgery. The two biggest factors are what kind of open heart surgery you're having how good your insurance is. A heart transplant runs more than $700,000, significantly more than most annual salaries. Other open heart surgeries are in the neighborhood of $325,000. Much of the expense is not only the four hour long surgery, but also the testing, the anesthesia, and the medication and aftercare that are all part of the package.</code> | <code>Foods You Can Eat After Heart Bypass. Healthy foods provide multiple benefits following heart bypass surgery. Heart bypass surgery, also called coronary bypass surgery, is performed to restore blood flow to your heart when a section of an artery in your heart is blocked.</code> |
|
170 |
+
* Loss: [<code>MarginMSELoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#marginmseloss) with these parameters:
|
171 |
+
```json
|
172 |
+
{
|
173 |
+
"activation_fct": "torch.nn.modules.linear.Identity"
|
174 |
+
}
|
175 |
+
```
|
176 |
+
|
177 |
+
### Evaluation Dataset
|
178 |
+
|
179 |
+
#### ms-marco-shuffled
|
180 |
+
|
181 |
+
* Dataset: [ms-marco-shuffled](https://huggingface.co/datasets/tomaarsen/ms-marco-shuffled) at [0e80192](https://huggingface.co/datasets/tomaarsen/ms-marco-shuffled/tree/0e8019214fbbb17845d8fa1e4594882944716633)
|
182 |
+
* Size: 39,780,704 evaluation samples
|
183 |
+
* Columns: <code>score</code>, <code>query</code>, <code>positive</code>, and <code>negative</code>
|
184 |
+
* Approximate statistics based on the first 1000 samples:
|
185 |
+
| | score | query | positive | negative |
|
186 |
+
|:--------|:--------------------------------------------------------------------|:------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|
|
187 |
+
| type | float | string | string | string |
|
188 |
+
| details | <ul><li>min: -1.57</li><li>mean: 13.57</li><li>max: 22.36</li></ul> | <ul><li>min: 10 characters</li><li>mean: 34.47 characters</li><li>max: 109 characters</li></ul> | <ul><li>min: 64 characters</li><li>mean: 345.45 characters</li><li>max: 963 characters</li></ul> | <ul><li>min: 56 characters</li><li>mean: 341.89 characters</li><li>max: 947 characters</li></ul> |
|
189 |
+
* Samples:
|
190 |
+
| score | query | positive | negative |
|
191 |
+
|:--------------------------------|:------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
192 |
+
| <code>16.928720156351726</code> | <code>where is joplin airport</code> | <code>Joplin Regional Airport. Joplin Regional Airport (IATA: JLN, ICAO: KJLN, FAA LID: JLN) is a city-owned airport four miles north of Joplin, in Jasper County, Missouri. It has airline service subsidized by the Essential Air Service program. Airline flights and general aviation are in separate terminals.</code> | <code>Hoskins Airport. If youâre flying from or into Hoskins airport or simply collecting someone from their flight to Hoskins, discover all the latest information you need from Hoskins airport. Find directions, airport information and local weather for Hoskins airport and details of airlines that fly to and from Hoskins.</code> |
|
193 |
+
| <code>15.824924786885578</code> | <code>where is the pd on your glasses frame</code> | <code>Pupillary Distance (PD) You'll need to know your PD if you want to order glasses from EyeBuyDirect. Don't worry if your glasses prescription doesn't include your PD, we can show you how to measure it by yourself. How to measure your pd</code> | <code>exists and is an alternate of . Mahwah PD in NJ makes 121k after 6 years, Bergenfield PD makes 117k after 5 years and there are endless PD'S that smash the base pay of SCPD. Mahwah PD in NJ makes 121k after 6 years, Bergenfield PD makes 117k after 5 years and there are endless PD'S that smash the base pay of SCPD.</code> |
|
194 |
+
| <code>18.074473301569622</code> | <code>what year did oldsmobile stop production</code> | <code>Oldsmobile was not the problem, it was GM that made oldmobiles but they stopped making them in 2004 and the reason is that Oldsmobiles did not bring in enough money for GM or ⦠(General Motors) to be happy so they stopped. but if you ask me i think any car that lasted 106 year is good enough and is a good car to keep selling.</code> | <code>Cinsaut vines. Known as Ottavianello, there is one tiny DOC devoted to Cinsaut-Ostuni Ottavianello, with a total production of less than 1000 cases a year.However, Cinsaut has long been used in Apulian blends and has also begun to attract the attention of winemakers interested in reviving old varieties.insaut vines. Known as Ottavianello, there is one tiny DOC devoted to Cinsaut-Ostuni Ottavianello, with a total production of less than 1000 cases a year.</code> |
|
195 |
+
* Loss: [<code>MarginMSELoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#marginmseloss) with these parameters:
|
196 |
+
```json
|
197 |
+
{
|
198 |
+
"activation_fct": "torch.nn.modules.linear.Identity"
|
199 |
+
}
|
200 |
+
```
|
201 |
+
|
202 |
+
### Training Hyperparameters
|
203 |
+
#### Non-Default Hyperparameters
|
204 |
+
|
205 |
+
- `eval_strategy`: steps
|
206 |
+
- `per_device_train_batch_size`: 64
|
207 |
+
- `per_device_eval_batch_size`: 64
|
208 |
+
- `learning_rate`: 8e-06
|
209 |
+
- `num_train_epochs`: 1
|
210 |
+
- `warmup_ratio`: 0.1
|
211 |
+
- `seed`: 12
|
212 |
+
- `bf16`: True
|
213 |
+
- `dataloader_num_workers`: 4
|
214 |
+
- `load_best_model_at_end`: True
|
215 |
+
|
216 |
+
#### All Hyperparameters
|
217 |
+
<details><summary>Click to expand</summary>
|
218 |
+
|
219 |
+
- `overwrite_output_dir`: False
|
220 |
+
- `do_predict`: False
|
221 |
+
- `eval_strategy`: steps
|
222 |
+
- `prediction_loss_only`: True
|
223 |
+
- `per_device_train_batch_size`: 64
|
224 |
+
- `per_device_eval_batch_size`: 64
|
225 |
+
- `per_gpu_train_batch_size`: None
|
226 |
+
- `per_gpu_eval_batch_size`: None
|
227 |
+
- `gradient_accumulation_steps`: 1
|
228 |
+
- `eval_accumulation_steps`: None
|
229 |
+
- `torch_empty_cache_steps`: None
|
230 |
+
- `learning_rate`: 8e-06
|
231 |
+
- `weight_decay`: 0.0
|
232 |
+
- `adam_beta1`: 0.9
|
233 |
+
- `adam_beta2`: 0.999
|
234 |
+
- `adam_epsilon`: 1e-08
|
235 |
+
- `max_grad_norm`: 1.0
|
236 |
+
- `num_train_epochs`: 1
|
237 |
+
- `max_steps`: -1
|
238 |
+
- `lr_scheduler_type`: linear
|
239 |
+
- `lr_scheduler_kwargs`: {}
|
240 |
+
- `warmup_ratio`: 0.1
|
241 |
+
- `warmup_steps`: 0
|
242 |
+
- `log_level`: passive
|
243 |
+
- `log_level_replica`: warning
|
244 |
+
- `log_on_each_node`: True
|
245 |
+
- `logging_nan_inf_filter`: True
|
246 |
+
- `save_safetensors`: True
|
247 |
+
- `save_on_each_node`: False
|
248 |
+
- `save_only_model`: False
|
249 |
+
- `restore_callback_states_from_checkpoint`: False
|
250 |
+
- `no_cuda`: False
|
251 |
+
- `use_cpu`: False
|
252 |
+
- `use_mps_device`: False
|
253 |
+
- `seed`: 12
|
254 |
+
- `data_seed`: None
|
255 |
+
- `jit_mode_eval`: False
|
256 |
+
- `use_ipex`: False
|
257 |
+
- `bf16`: True
|
258 |
+
- `fp16`: False
|
259 |
+
- `fp16_opt_level`: O1
|
260 |
+
- `half_precision_backend`: auto
|
261 |
+
- `bf16_full_eval`: False
|
262 |
+
- `fp16_full_eval`: False
|
263 |
+
- `tf32`: None
|
264 |
+
- `local_rank`: 0
|
265 |
+
- `ddp_backend`: None
|
266 |
+
- `tpu_num_cores`: None
|
267 |
+
- `tpu_metrics_debug`: False
|
268 |
+
- `debug`: []
|
269 |
+
- `dataloader_drop_last`: False
|
270 |
+
- `dataloader_num_workers`: 4
|
271 |
+
- `dataloader_prefetch_factor`: None
|
272 |
+
- `past_index`: -1
|
273 |
+
- `disable_tqdm`: False
|
274 |
+
- `remove_unused_columns`: True
|
275 |
+
- `label_names`: None
|
276 |
+
- `load_best_model_at_end`: True
|
277 |
+
- `ignore_data_skip`: False
|
278 |
+
- `fsdp`: []
|
279 |
+
- `fsdp_min_num_params`: 0
|
280 |
+
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
|
281 |
+
- `fsdp_transformer_layer_cls_to_wrap`: None
|
282 |
+
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
|
283 |
+
- `deepspeed`: None
|
284 |
+
- `label_smoothing_factor`: 0.0
|
285 |
+
- `optim`: adamw_torch
|
286 |
+
- `optim_args`: None
|
287 |
+
- `adafactor`: False
|
288 |
+
- `group_by_length`: False
|
289 |
+
- `length_column_name`: length
|
290 |
+
- `ddp_find_unused_parameters`: None
|
291 |
+
- `ddp_bucket_cap_mb`: None
|
292 |
+
- `ddp_broadcast_buffers`: False
|
293 |
+
- `dataloader_pin_memory`: True
|
294 |
+
- `dataloader_persistent_workers`: False
|
295 |
+
- `skip_memory_metrics`: True
|
296 |
+
- `use_legacy_prediction_loop`: False
|
297 |
+
- `push_to_hub`: False
|
298 |
+
- `resume_from_checkpoint`: None
|
299 |
+
- `hub_model_id`: None
|
300 |
+
- `hub_strategy`: every_save
|
301 |
+
- `hub_private_repo`: None
|
302 |
+
- `hub_always_push`: False
|
303 |
+
- `gradient_checkpointing`: False
|
304 |
+
- `gradient_checkpointing_kwargs`: None
|
305 |
+
- `include_inputs_for_metrics`: False
|
306 |
+
- `include_for_metrics`: []
|
307 |
+
- `eval_do_concat_batches`: True
|
308 |
+
- `fp16_backend`: auto
|
309 |
+
- `push_to_hub_model_id`: None
|
310 |
+
- `push_to_hub_organization`: None
|
311 |
+
- `mp_parameters`:
|
312 |
+
- `auto_find_batch_size`: False
|
313 |
+
- `full_determinism`: False
|
314 |
+
- `torchdynamo`: None
|
315 |
+
- `ray_scope`: last
|
316 |
+
- `ddp_timeout`: 1800
|
317 |
+
- `torch_compile`: False
|
318 |
+
- `torch_compile_backend`: None
|
319 |
+
- `torch_compile_mode`: None
|
320 |
+
- `dispatch_batches`: None
|
321 |
+
- `split_batches`: None
|
322 |
+
- `include_tokens_per_second`: False
|
323 |
+
- `include_num_input_tokens_seen`: False
|
324 |
+
- `neftune_noise_alpha`: None
|
325 |
+
- `optim_target_modules`: None
|
326 |
+
- `batch_eval_metrics`: False
|
327 |
+
- `eval_on_start`: False
|
328 |
+
- `use_liger_kernel`: False
|
329 |
+
- `eval_use_gather_object`: False
|
330 |
+
- `average_tokens_across_devices`: False
|
331 |
+
- `prompts`: None
|
332 |
+
- `batch_sampler`: batch_sampler
|
333 |
+
- `multi_dataset_batch_sampler`: proportional
|
334 |
+
|
335 |
+
</details>
|
336 |
+
|
337 |
+
### Training Logs
|
338 |
+
| Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_ndcg@10 | NanoNFCorpus_ndcg@10 | NanoNQ_ndcg@10 | NanoBEIR_mean_ndcg@10 |
|
339 |
+
|:----------:|:---------:|:-------------:|:---------------:|:--------------------:|:--------------------:|:--------------------:|:---------------------:|
|
340 |
+
| -1 | -1 | - | - | 0.0255 (-0.5150) | 0.3351 (+0.0101) | 0.0539 (-0.4467) | 0.1382 (-0.3172) |
|
341 |
+
| 0.0000 | 1 | 197.7525 | - | - | - | - | - |
|
342 |
+
| 0.0322 | 1000 | 189.9111 | - | - | - | - | - |
|
343 |
+
| 0.0643 | 2000 | 100.2999 | - | - | - | - | - |
|
344 |
+
| 0.0965 | 3000 | 33.4914 | - | - | - | - | - |
|
345 |
+
| 0.1286 | 4000 | 10.2638 | - | - | - | - | - |
|
346 |
+
| 0.1608 | 5000 | 7.333 | 6.1981 | 0.6326 (+0.0922) | 0.4145 (+0.0894) | 0.6989 (+0.1983) | 0.5820 (+0.1266) |
|
347 |
+
| 0.1930 | 6000 | 6.2212 | - | - | - | - | - |
|
348 |
+
| 0.2251 | 7000 | 5.6437 | - | - | - | - | - |
|
349 |
+
| 0.2573 | 8000 | 5.3485 | - | - | - | - | - |
|
350 |
+
| 0.2894 | 9000 | 5.0373 | - | - | - | - | - |
|
351 |
+
| 0.3216 | 10000 | 4.7753 | 4.3763 | 0.6565 (+0.1161) | 0.4161 (+0.0910) | 0.7294 (+0.2288) | 0.6007 (+0.1453) |
|
352 |
+
| 0.3538 | 11000 | 4.5805 | - | - | - | - | - |
|
353 |
+
| 0.3859 | 12000 | 4.4494 | - | - | - | - | - |
|
354 |
+
| 0.4181 | 13000 | 4.3038 | - | - | - | - | - |
|
355 |
+
| 0.4502 | 14000 | 4.2497 | - | - | - | - | - |
|
356 |
+
| **0.4824** | **15000** | **4.116** | **4.0312** | **0.6673 (+0.1269)** | **0.4034 (+0.0783)** | **0.7330 (+0.2324)** | **0.6012 (+0.1459)** |
|
357 |
+
| 0.5146 | 16000 | 4.0779 | - | - | - | - | - |
|
358 |
+
| 0.5467 | 17000 | 4.0045 | - | - | - | - | - |
|
359 |
+
| 0.5789 | 18000 | 3.8951 | - | - | - | - | - |
|
360 |
+
| 0.6111 | 19000 | 3.8733 | - | - | - | - | - |
|
361 |
+
| 0.6432 | 20000 | 3.7693 | 3.7577 | 0.6624 (+0.1220) | 0.4052 (+0.0802) | 0.7282 (+0.2276) | 0.5986 (+0.1432) |
|
362 |
+
| 0.6754 | 21000 | 3.794 | - | - | - | - | - |
|
363 |
+
| 0.7075 | 22000 | 3.6753 | - | - | - | - | - |
|
364 |
+
| 0.7397 | 23000 | 3.6859 | - | - | - | - | - |
|
365 |
+
| 0.7719 | 24000 | 3.6511 | - | - | - | - | - |
|
366 |
+
| 0.8040 | 25000 | 3.6294 | 3.6983 | 0.6507 (+0.1103) | 0.4054 (+0.0804) | 0.7291 (+0.2284) | 0.5951 (+0.1397) |
|
367 |
+
| 0.8362 | 26000 | 3.6437 | - | - | - | - | - |
|
368 |
+
| 0.8683 | 27000 | 3.549 | - | - | - | - | - |
|
369 |
+
| 0.9005 | 28000 | 3.529 | - | - | - | - | - |
|
370 |
+
| 0.9327 | 29000 | 3.535 | - | - | - | - | - |
|
371 |
+
| 0.9648 | 30000 | 3.5088 | 3.6602 | 0.6574 (+0.1170) | 0.4052 (+0.0801) | 0.7230 (+0.2223) | 0.5952 (+0.1398) |
|
372 |
+
| 0.9970 | 31000 | 3.472 | - | - | - | - | - |
|
373 |
+
| -1 | -1 | - | - | 0.6673 (+0.1269) | 0.4034 (+0.0783) | 0.7330 (+0.2324) | 0.6012 (+0.1459) |
|
374 |
+
|
375 |
+
* The bold row denotes the saved checkpoint.
|
376 |
+
|
377 |
+
### Framework Versions
|
378 |
+
- Python: 3.11.10
|
379 |
+
- Sentence Transformers: 3.5.0.dev0
|
380 |
+
- Transformers: 4.49.0.dev0
|
381 |
+
- PyTorch: 2.6.0.dev20241112+cu121
|
382 |
+
- Accelerate: 1.2.0
|
383 |
+
- Datasets: 3.2.0
|
384 |
+
- Tokenizers: 0.21.0
|
385 |
+
|
386 |
+
## Citation
|
387 |
+
|
388 |
+
### BibTeX
|
389 |
+
|
390 |
+
#### Sentence Transformers
|
391 |
+
```bibtex
|
392 |
+
@inproceedings{reimers-2019-sentence-bert,
|
393 |
+
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
|
394 |
+
author = "Reimers, Nils and Gurevych, Iryna",
|
395 |
+
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
|
396 |
+
month = "11",
|
397 |
+
year = "2019",
|
398 |
+
publisher = "Association for Computational Linguistics",
|
399 |
+
url = "https://arxiv.org/abs/1908.10084",
|
400 |
+
}
|
401 |
+
```
|
402 |
+
|
403 |
+
#### MarginMSELoss
|
404 |
+
```bibtex
|
405 |
+
@misc{hofstätter2021improving,
|
406 |
+
title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation},
|
407 |
+
author={Sebastian Hofstätter and Sophia Althammer and Michael Schröder and Mete Sertkan and Allan Hanbury},
|
408 |
+
year={2021},
|
409 |
+
eprint={2010.02666},
|
410 |
+
archivePrefix={arXiv},
|
411 |
+
primaryClass={cs.IR}
|
412 |
+
}
|
413 |
+
```
|
414 |
+
|
415 |
+
<!--
|
416 |
+
## Glossary
|
417 |
+
|
418 |
+
*Clearly define terms in order to be accessible across audiences.*
|
419 |
+
-->
|
420 |
+
|
421 |
+
<!--
|
422 |
+
## Model Card Authors
|
423 |
+
|
424 |
+
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
|
425 |
+
-->
|
426 |
+
|
427 |
+
<!--
|
428 |
+
## Model Card Contact
|
429 |
+
|
430 |
+
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
|
431 |
+
-->
|
config.json
ADDED
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "microsoft/MiniLM-L12-H384-uncased",
|
3 |
+
"architectures": [
|
4 |
+
"BertForSequenceClassification"
|
5 |
+
],
|
6 |
+
"attention_probs_dropout_prob": 0.1,
|
7 |
+
"classifier_dropout": null,
|
8 |
+
"hidden_act": "gelu",
|
9 |
+
"hidden_dropout_prob": 0.1,
|
10 |
+
"hidden_size": 384,
|
11 |
+
"id2label": {
|
12 |
+
"0": "LABEL_0"
|
13 |
+
},
|
14 |
+
"initializer_range": 0.02,
|
15 |
+
"intermediate_size": 1536,
|
16 |
+
"label2id": {
|
17 |
+
"LABEL_0": 0
|
18 |
+
},
|
19 |
+
"layer_norm_eps": 1e-12,
|
20 |
+
"max_position_embeddings": 512,
|
21 |
+
"model_type": "bert",
|
22 |
+
"num_attention_heads": 12,
|
23 |
+
"num_hidden_layers": 12,
|
24 |
+
"pad_token_id": 0,
|
25 |
+
"position_embedding_type": "absolute",
|
26 |
+
"torch_dtype": "float32",
|
27 |
+
"transformers_version": "4.49.0.dev0",
|
28 |
+
"type_vocab_size": 2,
|
29 |
+
"use_cache": true,
|
30 |
+
"vocab_size": 30522
|
31 |
+
}
|
model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:04f5bbbaf47a34ed2db404b4111d7cd891c68dfe0c32e2868db96eae3c2723b8
|
3 |
+
size 133464836
|
special_tokens_map.json
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cls_token": "[CLS]",
|
3 |
+
"mask_token": "[MASK]",
|
4 |
+
"pad_token": "[PAD]",
|
5 |
+
"sep_token": "[SEP]",
|
6 |
+
"unk_token": "[UNK]"
|
7 |
+
}
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"added_tokens_decoder": {
|
3 |
+
"0": {
|
4 |
+
"content": "[PAD]",
|
5 |
+
"lstrip": false,
|
6 |
+
"normalized": false,
|
7 |
+
"rstrip": false,
|
8 |
+
"single_word": false,
|
9 |
+
"special": true
|
10 |
+
},
|
11 |
+
"100": {
|
12 |
+
"content": "[UNK]",
|
13 |
+
"lstrip": false,
|
14 |
+
"normalized": false,
|
15 |
+
"rstrip": false,
|
16 |
+
"single_word": false,
|
17 |
+
"special": true
|
18 |
+
},
|
19 |
+
"101": {
|
20 |
+
"content": "[CLS]",
|
21 |
+
"lstrip": false,
|
22 |
+
"normalized": false,
|
23 |
+
"rstrip": false,
|
24 |
+
"single_word": false,
|
25 |
+
"special": true
|
26 |
+
},
|
27 |
+
"102": {
|
28 |
+
"content": "[SEP]",
|
29 |
+
"lstrip": false,
|
30 |
+
"normalized": false,
|
31 |
+
"rstrip": false,
|
32 |
+
"single_word": false,
|
33 |
+
"special": true
|
34 |
+
},
|
35 |
+
"103": {
|
36 |
+
"content": "[MASK]",
|
37 |
+
"lstrip": false,
|
38 |
+
"normalized": false,
|
39 |
+
"rstrip": false,
|
40 |
+
"single_word": false,
|
41 |
+
"special": true
|
42 |
+
}
|
43 |
+
},
|
44 |
+
"clean_up_tokenization_spaces": true,
|
45 |
+
"cls_token": "[CLS]",
|
46 |
+
"do_basic_tokenize": true,
|
47 |
+
"do_lower_case": true,
|
48 |
+
"extra_special_tokens": {},
|
49 |
+
"mask_token": "[MASK]",
|
50 |
+
"model_max_length": 512,
|
51 |
+
"never_split": null,
|
52 |
+
"pad_token": "[PAD]",
|
53 |
+
"sep_token": "[SEP]",
|
54 |
+
"strip_accents": null,
|
55 |
+
"tokenize_chinese_chars": true,
|
56 |
+
"tokenizer_class": "BertTokenizer",
|
57 |
+
"unk_token": "[UNK]"
|
58 |
+
}
|
vocab.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|