aamirshakir's picture
Update README.md
03241da verified
|
raw
history blame
49 kB
metadata
library_name: transformers
tags:
  - reranker
  - transformers.js
license: apache-2.0
language:
  - en



The crispy rerank family from Mixedbread.

mxbai-rerank-base-v1

This is the base model in our family of powerful reranker models. You can learn more about the models in our blog post.

We have three models:

Quickstart

Currently, the best way to use our models is with the most recent version of sentence-transformers.

pip install -U sentence-transformers

Let's say you have a query, and you want to rerank a set of documents. You can do that with only one line of code:

from sentence_transformers import CrossEncoder

# Load the model, here we use our base sized model
model = CrossEncoder("mixedbread-ai/mxbai-rerank-base-v1")


# Example query and documents
query = "Who wrote 'To Kill a Mockingbird'?"
documents = [
    "'To Kill a Mockingbird' is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
    "The novel 'Moby-Dick' was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",
    "Harper Lee, an American novelist widely known for her novel 'To Kill a Mockingbird', was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
    "Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",
    "The 'Harry Potter' series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
    "'The Great Gatsby', a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
]

# Lets get the scores
results = model.rank(query, documents, return_documents=True, top_k=3)
JavaScript Example

Install transformers.js

npm i @xenova/transformers

Let's say you have a query, and you want to rerank a set of documents. In JavaScript, you need to add a function:

import { AutoTokenizer, AutoModelForSequenceClassification } from '@xenova/transformers';

const model_id = 'mixedbread-ai/mxbai-rerank-base-v1';
const model = await AutoModelForSequenceClassification.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);

/**
 * Performs ranking with the CrossEncoder on the given query and documents. Returns a sorted list with the document indices and scores.
 * @param {string} query A single query
 * @param {string[]} documents A list of documents
 * @param {Object} options Options for ranking
 * @param {number} [options.top_k=undefined] Return the top-k documents. If undefined, all documents are returned.
 * @param {number} [options.return_documents=false] If true, also returns the documents. If false, only returns the indices and scores.
 */
async function rank(query, documents, {
    top_k = undefined,
    return_documents = false,
} = {}) {
    const inputs = tokenizer(
        new Array(documents.length).fill(query),
        {
            text_pair: documents,
            padding: true,
            truncation: true,
        }
    )
    const { logits } = await model(inputs);
    return logits
        .sigmoid()
        .tolist()
        .map(([score], i) => ({
            corpus_id: i,
            score,
            ...(return_documents ? { text: documents[i] } : {})
        }))
        .sort((a, b) => b.score - a.score)
        .slice(0, top_k);
}

// Example usage:
const query = "Who wrote 'To Kill a Mockingbird'?"
const documents = [
    "'To Kill a Mockingbird' is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
    "The novel 'Moby-Dick' was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",
    "Harper Lee, an American novelist widely known for her novel 'To Kill a Mockingbird', was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
    "Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",
    "The 'Harry Potter' series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
    "'The Great Gatsby', a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
]

const results = await rank(query, documents, { return_documents: true, top_k: 3 });
console.log(results);

Using API

You can use the large model via our API as follows:

from mixedbread_ai.client import MixedbreadAI

mxbai = MixedbreadAI(api_key="{MIXEDBREAD_API_KEY}")

res = mxbai.reranking(
  model="mixedbread-ai/mxbai-rerank-large-v1",
  query="Who is the author of To Kill a Mockingbird?",
  input=[
    "To Kill a Mockingbird is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
    "The novel Moby-Dick was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",
    "Harper Lee, an American novelist widely known for her novel To Kill a Mockingbird, was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
    "Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",
    "The Harry Potter series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
    "The Great Gatsby, a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
  ],
  top_k=3,
  return_input=false
)

print(res.data)

The API comes with additional features, such as a continous trained reranker! Check out the docs for more information.

Evaluation

Our reranker models are designed to elevate your search. They work extremely well in combination with keyword search and can even outperform semantic search systems in many cases.

Model NDCG@10 Accuracy@3
Lexical Search (Lucene) 38.0 66.4
BAAI/bge-reranker-base 41.6 66.9
BAAI/bge-reranker-large 45.2 70.6
cohere-embed-v3 (semantic search) 47.5 70.9
mxbai-rerank-xsmall-v1 43.9 70.0
mxbai-rerank-base-v1 46.9 72.3
mxbai-rerank-large-v1 48.8 74.9

The reported results are aggregated from 11 datasets of BEIR. We used Pyserini to evaluate the models. Find more in our blog-post and on this spreadsheet.

Community

Please join our Discord Community and share your feedback and thoughts! We are here to help and also always happy to chat.

License

Apache 2.0