Model Card for Model ID
This model is finetuned starting from the well-known ms-marco-MiniLM-L-6-v2 using KL distillation techniques as described here, using bge-reranker-v2-m3 as teacher
Usage
Usage with Transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model = AutoModelForSequenceClassification.from_pretrained("juanluisdb/MiniLM-L-6-rerank-m3")
tokenizer = AutoTokenizer.from_pretrained("juanluisdb/MiniLM-L-6-rerank-m3")
features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'], padding=True, truncation=True, return_tensors="pt")
model.eval()
with torch.no_grad():
scores = model(**features).logits
print(scores)
Usage with SentenceTransformers
from sentence_transformers import CrossEncoder
model = CrossEncoder("juanluisdb/MiniLM-L-6-rerank-m3", max_length=512)
scores = model.predict([('Query', 'Paragraph1'), ('Query', 'Paragraph2') , ('Query', 'Paragraph3')])
Evaluation
BEIR (NDCG@10)
I've run tests on different BEIR datasets. Cross Encoders rerank top100 BM25 results.
bm25 | jina-reranker-v1-turbo-en | bge-reranker-v2-m3 | mxbai-rerank-base-v1 | ms-marco-MiniLM-L-6-v2 | MiniLM-L-6-rerank-m3 | |
---|---|---|---|---|---|---|
nq* | 0.305 | 0.533 | 0.597 | 0.535 | 0.523 | 0.580 |
fever* | 0.638 | 0.852 | 0.857 | 0.767 | 0.801 | 0.867 |
fiqa | 0.238 | 0.336 | 0.397 | 0.382 | 0.349 | 0.364 |
trec-covid | 0.589 | 0.774 | 0.784 | 0.830 | 0.741 | 0.738 |
scidocs | 0.15 | 0.166 | 0.169 | 0.171 | 0.164 | 0.165 |
scifact | 0.676 | 0.739 | 0.731 | 0.719 | 0.688 | 0.750 |
nfcorpus | 0.318 | 0.353 | 0.336 | 0.353 | 0.349 | 0.350 |
hotpotqa | 0.629 | 0.745 | 0.794 | 0.668 | 0.724 | 0.775 |
dbpedia-entity | 0.319 | 0.421 | 0.445 | 0.416 | 0.445 | 0.444 |
quora | 0.787 | 0.858 | 0.858 | 0.747 | 0.825 | 0.871 |
climate-fever | 0.163 | 0.233 | 0.314 | 0.253 | 0.244 | 0.309 |
* Training splits of NQ and Fever were used as part of the training data.
Comparison with ablated model trained only on MSMarco:
ms-marco-MiniLM-L-6-v2 | MiniLM-L-6-rerank-m3-ablated | |
---|---|---|
nq | 0.5234 | 0.5412 |
fever | 0.8007 | 0.8221 |
fiqa | 0.349 | 0.3598 |
trec-covid | 0.741 | 0.7331 |
scidocs | 0.1638 | 0.163 |
scifact | 0.688 | 0.7376 |
nfcorpus | 0.3493 | 0.3495 |
hotpotqa | 0.7235 | 0.7583 |
dbpedia-entity | 0.4445 | 0.4382 |
quora | 0.8251 | 0.8619 |
climate-fever | 0.2438 | 0.2449 |
Datasets Used
~900k queries with 32-way triplets were used from these datasets:
- MSMarco
- TriviaQA
- Natural Questions
- FEVER
- Downloads last month
- 32
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for juanluisdb/MiniLM-L-6-rerank-m3
Base model
cross-encoder/ms-marco-MiniLM-L-6-v2