nielsr HF Staff

Update model card with paper and GitHub links

df705d9 verified 8 months ago

3.41 kB

base_model:
  - Qwen/Qwen3-4B
language:
  - en
library_name: sentence-transformers
license: cc-by-nc-4.0
pipeline_tag: text-ranking
tags:
  - finance
  - legal
  - code
  - stem
  - medical

Releasing zeroentropy/zerank-1

This model is the zerank-1 reranker as introduced in the paper zELO: ELO-inspired Training Method for Rerankers and Embedding Models.

Code: https://github.com/zeroentropy-ai/zbench

In search engines, rerankers are crucial for improving the accuracy of your retrieval system.

However, SOTA rerankers are closed-source and proprietary. At ZeroEntropy, we've trained a SOTA reranker outperforming closed-source competitors, and we're launching our model here on HuggingFace.

This reranker outperforms proprietary rerankers such as cohere-rerank-v3.5 and Salesforce/LlamaRank-v1 across a wide variety of domains, including finance, legal, code, STEM, medical, and conversational data.

At ZeroEntropy we've developed an innovative multi-stage pipeline that models query-document relevance scores as adjusted Elo ratings. More details are available in our paper: zELO: ELO-inspired Training Method for Rerankers and Embedding Models.

Since we're a small company, this model is only released under a non-commercial license. If you'd like a commercial license, please contact us at founders@zeroentropy.dev and we'll get you a license ASAP.

For this model's smaller twin, see zerank-1-small, which we've fully open-sourced under an Apache 2.0 License.

How to Use

from sentence_transformers import CrossEncoder

model = CrossEncoder("zeroentropy/zerank-1", trust_remote_code=True)

query_documents = [
    ("What is 2+2?", "4"),
    ("What is 2+2?", "The answer is definitely 1 million"),
]

scores = model.predict(query_documents)

print(scores)

The model can also be inferenced using ZeroEntropy's /models/rerank endpoint.

Evaluations

NDCG@10 scores between zerank-1 and competing closed-source proprietary rerankers. Since we are evaluating rerankers, OpenAI's text-embedding-3-small is used as an initial retriever for the Top 100 candidate documents.

Task	Embedding	cohere-rerank-v3.5	Salesforce/Llama-rank-v1	zerank-1-small	zerank-1
Code	0.678	0.724	0.694	0.730	0.754
Conversational	0.250	0.571	0.484	0.556	0.596
Finance	0.839	0.824	0.828	0.861	0.894
Legal	0.703	0.804	0.767	0.817	0.821
Medical	0.619	0.750	0.719	0.773	0.796
STEM	0.401	0.510	0.595	0.680	0.694

Comparing BM25 and Hybrid Search without and with zerank-1:

Description