File size: 7,023 Bytes
a418052
 
 
 
dfd64c8
 
 
 
 
 
a418052
 
 
 
dfd64c8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a418052
6ea9a53
a418052
6ea9a53
dfd64c8
a418052
9837751
 
 
 
 
 
 
 
 
 
 
 
 
 
6ea9a53
 
 
 
 
 
 
 
a418052
dfd64c8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
library_name: transformers
tags:
- cross-encoder
datasets:
- lightonai/ms-marco-en-bge
language:
- en
base_model:
- cross-encoder/ms-marco-MiniLM-L-6-v2
---

# Model Card for Model ID

This model is finetuned starting from the well-known [ms-marco-MiniLM-L-6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-6-v2) using KL distillation techniques as described [here](https://www.answer.ai/posts/2024-08-13-small-but-mighty-colbert.html),
using [bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) as teacher

# Usage

## Usage with Transformers

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model = AutoModelForSequenceClassification.from_pretrained("juanluisdb/MiniLM-L-6-rerank-reborn")
tokenizer = AutoTokenizer.from_pretrained("juanluisdb/MiniLM-L-6-rerank-reborn")
features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'],  padding=True, truncation=True, return_tensors="pt")
model.eval()
with torch.no_grad():
    scores = model(**features).logits
    print(scores)
```


## Usage with SentenceTransformers

```python
from sentence_transformers import CrossEncoder
model = CrossEncoder("juanluisdb/MiniLM-L-6-rerank-reborn", max_length=512)
scores = model.predict([('Query', 'Paragraph1'), ('Query', 'Paragraph2') , ('Query', 'Paragraph3')])
```

# Evaluation

### BEIR (NDCG@10)
I've run tests on different BEIR datasets. Cross Encoders rerank top100 BM25 results.

|                |   bm25 |   jina-reranker-v1-turbo-en | bge-reranker-v2-m3   | mxbai-rerank-base-v1   |   ms-marco-MiniLM-L-6-v2 |   MiniLM-L-6-rerank-refreshed-ablated | MiniLM-L-6-rerank-refreshed   |
|:---------------|-------:|----------------------------:|:---------------------|:-----------------------|-------------------------:|--------------------------------------:|:------------------------------|
| nq             |  0.305 |                       0.533 | **0.597**            | 0.535                  |                    0.523 |                                 0.541 | 0.580                         |
| fever          |  0.638 |                       0.852 | 0.857                | 0.767                  |                    0.801 |                                 0.822 | **0.867**                     |
| fiqa           |  0.238 |                       0.336 | **0.397**            | 0.382                  |                    0.349 |                                 0.36  | 0.364                         |
| trec-covid     |  0.589 |                       0.774 | 0.784                | **0.830**              |                    0.741 |                                 0.733 | 0.738                         |
| scidocs        |  0.15  |                       0.166 | 0.169                | **0.171**              |                    0.164 |                                 0.163 | 0.165                         |
| scifact        |  0.676 |                       0.739 | 0.731                | 0.719                  |                    0.688 |                                 0.738 | **0.750**                     |
| nfcorpus       |  0.318 |                       0.353 | 0.336                | **0.353**              |                    0.349 |                                 0.35  | 0.350                         |
| hotpotqa       |  0.629 |                       0.745 | **0.794**            | 0.668                  |                    0.724 |                                 0.758 | 0.775                         |
| dbpedia-entity |  0.319 |                       0.421 | **0.445**            | 0.416                  |                    0.445 |                                 0.438 | 0.444                         |
| quora          |  0.787 |                       0.858 | 0.858                | 0.747                  |                    0.825 |                                 0.862 | **0.871**                     |
| climate-fever  |  0.163 |                       0.233 | **0.314**            | 0.253                  |                    0.244 |                                 0.245 | 0.309                         |

|                           | nq*        | fever*     | fiqa      | trec-covid   | scidocs   | scifact   | nfcorpus   | hotpotqa   | dbpedia-entity   | quora     | climate-fever   |
|:--------------------------|:----------|:----------|:----------|:-------------|:----------|:----------|:-----------|:-----------|:-----------------|:----------|:----------------|
| bm25                      | 0.305     | 0.638     | 0.238     | 0.589        | 0.150     | 0.676     | 0.318      | 0.629      | 0.319            | 0.787     | 0.163           |
| jina-reranker-v1-turbo-en | 0.533     | 0.852     | 0.336     | 0.774        | 0.166     | 0.739     | 0.353      | 0.745      | 0.421            | 0.858     | 0.233           |
| bge-reranker-v2-m3        | **0.597** | 0.857     | **0.397** | 0.784        | 0.169     | 0.731     | 0.336      | **0.794**  | **0.445**        | 0.858     | **0.314**       |
| mxbai-rerank-base-v1      | 0.535     | 0.767     | 0.382     | **0.830**    | **0.171** | 0.719     | **0.353**  | 0.668      | 0.416            | 0.747     | 0.253           |
| ms-marco-MiniLM-L-6-v2    | 0.523     | 0.801     | 0.349     | 0.741        | 0.164     | 0.688     | 0.349      | 0.724      | 0.445            | 0.825     | 0.244           |
| MiniLM-L-6-rerank-reborn      | 0.580     | **0.867** | 0.364     | 0.738        | 0.165     | **0.750** | 0.350      | 0.775      | 0.444            | **0.871** | 0.309           |

\* Training splits of NQ and Fever were used as part of the training data.

Comparison with [ablated model](https://huggingface.co/juanluisdb/MiniLM-L-6-rerank-reborn-ablated/settings) trained only on MSMarco:
|                                     |     nq |   fever |   fiqa |   trec-covid |   scidocs |   scifact |   nfcorpus |   hotpotqa |   dbpedia-entity |   quora |   climate-fever |
|:------------------------------------|-------:|--------:|-------:|-------------:|----------:|----------:|-----------:|-----------:|-----------------:|--------:|----------------:|
| ms-marco-MiniLM-L-6-v2              | 0.5234 |  0.8007 | 0.349  |       0.741  |    0.1638 |    0.688  |     0.3493 |     0.7235 |           0.4445 |  0.8251 |          0.2438 |
| MiniLM-L-6-rerank-refreshed-ablated | 0.5412 |  0.8221 | 0.3598 |       0.7331 |    0.163  |    0.7376 |     0.3495 |     0.7583 |           0.4382 |  0.8619 |          0.2449 |
| improvement (%)                     | **3.40** |  **2.67** | **3.08** |      -1.07 |   -0.47 |    **7.22**  |     0.08 |     **4.80** |          -1.41 |  **4.45** |          **0.47** |


# Datasets Used

~900k queries with 32-way triplets were used from these datasets:

* MSMarco
* TriviaQA
* Natural Questions
* FEVER