Cross-Encoder for MS Marco

This model uses BERT-Tiny, a tiny BERT model with only 2 layers, 2 attention heads and 128 dimension size.

It was trained on MS Marco Passage Ranking task.

The model can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See SBERT.net Information Retrieval for more details. The training code is available here: SBERT.net Training MS Marco

Usage and Performance

Pre-trained models can be used like this:

from sentence_transformers import CrossEncoder
model = CrossEncoder('model_name', max_length=512)
scores = model.predict([('Query', 'Paragraph1'), ('Query', 'Paragraph2') , ('Query', 'Paragraph3')])

In the following table, we provide various pre-trained Cross-Encoders together with their performance on the TREC Deep Learning 2019 and the MS Marco Passage Reranking dataset.

Model-Name	NDCG@10 (TREC DL 19)	MRR@10 (MS Marco Dev)	Docs / Sec (BertTokenizerFast)	Docs / Sec (Python Tokenizer)
cross-encoder/ms-marco-TinyBERT-L-2	67.43	30.15	9000	780
cross-encoder/ms-marco-TinyBERT-L-4	68.09	34.50	2900	760
cross-encoder/ms-marco-TinyBERT-L-6	69.57	36.13	680	660
cross-encoder/ms-marco-electra-base	71.99	36.41	340	340
Other models
nboost/pt-tinybert-msmarco	63.63	28.80	2900	760
nboost/pt-bert-base-uncased-msmarco	70.94	34.75	340	340
nboost/pt-bert-large-msmarco	73.36	36.48	100	100
Capreolus/electra-base-msmarco	71.23		340	340
amberoad/bert-multilingual-passage-reranking-msmarco	68.40		330	330

Note: Runtime was computed on a V100 GPU. A bottleneck for smaller models is the standard Python tokenizer from Huggingface in version 3. Replacing it with the fast tokenizer based on Rust, the throughput is significantly improved: