Using mxbai-embed-large-v1 in a CrossEncoder
Dear mixedbread-ai team,
I was looking through the MTEB leaderboard for Reranking, and noticed that the model of "mxbai-embed-large-v1" was in the ranks.
It was not explicitly stated in the model-card (HuggingFace) that it can be used for reranking, but can I ask if it is compatible with sentence-transformers' CrossEncoder class?
Hi
@crestero
, thank you for following our work. mxbai-embed-large-v1
is a model that produces single dense vectors. For cross encoders, you could use our rerankers: https://huggingface.co/collections/mixedbread-ai/reranking-series-6605a44260cba6d2eec7d4de . They are open-sourced and have various sizes.
Hi
@SeanLee97
, appreciate the quick reply. Noted on that. I will try a retrieve-rerank pipeline with mxbai-embed-large-v1
for retrieval and mxbai-rerank-large-v1
for reranking.
For the reranking series, are they on the MTEB reranking leaderboard as of now? I could not find these models up there.
Hi @SeanLee97 , do your team have plans for these models to be inaugurated into the leaderboard?
It would be nice for users be aware of its performance with respect to the SOTA models that are on the board itself. From a qualitative perspective, it is also a good push factor for more users to utilize those reranker models.
For context @crestero , many of the MTEB benchmarks are infeasible to run with cross-encoder (i.e. reranker) models as they require inference for all pairs of data instead of all data individually like bi-encoder (Sentence Transformer) models.
So, the NQ retrieval benchmark with 3452 queries and 2.68M documents takes ~2.68M inferences with a bi-encoder model, but 9,251,360,000 inferences with a cross-encoder.
This is why cross-encoders are often used to rerank a query against a few documents (e.g. 100), and why cross-encoders aren't always on bi-encoder leaderboards.
They kind of need a separate leaderboard where people use BM25 as the retriever and then use a reranker as a top-100 reranker. Then the relative performance gain to just BM25 is checked.
- Tom Aarsen
Hi @tomaarsen @SeanLee97 , thank you for your replies, that makes sense, it is too computationally intensive to carry out such an operation.
Just a side question, can those models in mxbai's reranking series perform symmetric ranking? Usually we rerank documents (information that answers the query) against the query, but can we use those cross-encoders to rank questions against a query in a symmetric fashion?