Cross-Encoder for MS Marco

This model uses BERT-Tiny, a tiny BERT model with only 2 layers, 2 attention heads and 128 dimension size.

It was trained on MS Marco Passage Ranking task.

The model can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See Information Retrieval for more details. The training code is available here: Training MS Marco

Usage and Performance

Pre-trained models can be used like this:

from sentence_transformers import CrossEncoder
model = CrossEncoder('model_name', max_length=512)
scores = model.predict([('Query', 'Paragraph1'), ('Query', 'Paragraph2') , ('Query', 'Paragraph3')])

In the following table, we provide various pre-trained Cross-Encoders together with their performance on the TREC Deep Learning 2019 and the MS Marco Passage Reranking dataset.

Model-Name NDCG@10 (TREC DL 19) MRR@10 (MS Marco Dev) Docs / Sec (BertTokenizerFast) Docs / Sec (Python Tokenizer)
cross-encoder/ms-marco-TinyBERT-L-2 67.43 30.15 9000 780
cross-encoder/ms-marco-TinyBERT-L-4 68.09 34.50 2900 760
cross-encoder/ms-marco-TinyBERT-L-6 69.57 36.13 680 660
cross-encoder/ms-marco-electra-base 71.99 36.41 340 340
Other models
nboost/pt-tinybert-msmarco 63.63 28.80 2900 760
nboost/pt-bert-base-uncased-msmarco 70.94 34.75 340 340
nboost/pt-bert-large-msmarco 73.36 36.48 100 100
Capreolus/electra-base-msmarco 71.23 340 340
amberoad/bert-multilingual-passage-reranking-msmarco 68.40 330 330

Note: Runtime was computed on a V100 GPU. A bottleneck for smaller models is the standard Python tokenizer from Huggingface in version 3. Replacing it with the fast tokenizer based on Rust, the throughput is significantly improved:

Downloads last month
Hosted inference API
Text Classification