Instructions to use antflydb/mxbai-rerank-base-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use antflydb/mxbai-rerank-base-v1 with Transformers:

# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("antflydb/mxbai-rerank-base-v1")
model = AutoModelForSequenceClassification.from_pretrained("antflydb/mxbai-rerank-base-v1")

llama-cpp-python

How to use antflydb/mxbai-rerank-base-v1 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="antflydb/mxbai-rerank-base-v1",
	filename="mxbai-rerank-base-v1.Q4_K.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use antflydb/mxbai-rerank-base-v1 with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf antflydb/mxbai-rerank-base-v1:Q8_0
# Run inference directly in the terminal:
llama cli -hf antflydb/mxbai-rerank-base-v1:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf antflydb/mxbai-rerank-base-v1:Q8_0
# Run inference directly in the terminal:
llama cli -hf antflydb/mxbai-rerank-base-v1:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf antflydb/mxbai-rerank-base-v1:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf antflydb/mxbai-rerank-base-v1:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf antflydb/mxbai-rerank-base-v1:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf antflydb/mxbai-rerank-base-v1:Q8_0

Use Docker

docker model run hf.co/antflydb/mxbai-rerank-base-v1:Q8_0

LM Studio
Jan
Ollama
How to use antflydb/mxbai-rerank-base-v1 with Ollama:
```
ollama run hf.co/antflydb/mxbai-rerank-base-v1:Q8_0
```

Unsloth Studio

How to use antflydb/mxbai-rerank-base-v1 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for antflydb/mxbai-rerank-base-v1 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for antflydb/mxbai-rerank-base-v1 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for antflydb/mxbai-rerank-base-v1 to start chatting

Atomic Chat new
Docker Model Runner
How to use antflydb/mxbai-rerank-base-v1 with Docker Model Runner:
```
docker model run hf.co/antflydb/mxbai-rerank-base-v1:Q8_0
```

Lemonade

How to use antflydb/mxbai-rerank-base-v1 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull antflydb/mxbai-rerank-base-v1:Q8_0

Run and chat with the model

lemonade run user.mxbai-rerank-base-v1-Q8_0

List all available models

lemonade list

mxbai-rerank-base-v1 for Antfly Inference

This repository packages Mixedbread's mxbai-rerank-base-v1 reranker for Antfly Inference deployments. It includes the original safetensors checkpoint plus Antfly Inference GGUF variants for local CPU/GPU inference.

The model is a cross-encoder reranker: given a query and a set of candidate documents, it scores each query/document pair so retrieval systems can re-order candidates after lexical or embedding search.

Files

File	Purpose
`model.safetensors`	Original fp32/fp16-compatible Transformers checkpoint
`mxbai-rerank-base-v1.Q8_0.gguf`	Higher-fidelity GGUF quantization
`mxbai-rerank-base-v1.Q4_K.gguf`	Smaller GGUF quantization for lower memory use
`config.json`	Model architecture/configuration
`tokenizer.json`, `spm.model`, tokenizer sidecars	Tokenization assets
`model_manifest.json`	Antfly Inference model capability manifest
`antfly_inference_variants.json`	Antfly Inference GGUF variant index

Intended Uses

Reranking search results from BM25, vector search, or hybrid retrieval
Improving top-k precision in RAG pipelines
Local reranking in Antfly Inference services
Offline evaluation of reranking quality and quantization drift

How to Use with Antfly Inference

antfly inference pull antflydb/mxbai-rerank-base-v1:gguf:Q8_0
antfly inference run

Use :gguf:Q4_K instead for the smaller rank-oriented artifact.

curl -X POST http://localhost:8082/rerank \
  -H "Content-Type: application/json" \
  -d '{
    "model": "antflydb/mxbai-rerank-base-v1",
    "query": "Who wrote To Kill a Mockingbird?",
    "documents": [
      "To Kill a Mockingbird is a novel by Harper Lee.",
      "Moby-Dick was written by Herman Melville.",
      "Jane Austen wrote Pride and Prejudice."
    ]
  }'

Quantized Variants

The GGUF files were exported with Antfly's inference exporter from the mixedbread-ai/mxbai-rerank-base-v1 safetensors source.

Variant	File	Size	Notes
Q8_0	`mxbai-rerank-base-v1.Q8_0.gguf`	~278 MB	Better score fidelity; DeBERTa word embeddings remain dense F16 for CUDA compatibility
Q4_K	`mxbai-rerank-base-v1.Q4_K.gguf`	~103 MB	Smaller footprint; rank-validated, with larger absolute score drift than Q8_0

Validation

Validated locally with Antfly CUDA rerank on:

Basic relevance ranking
Empty-document handling
Long-input truncation
Multi-document ordering

Results versus the safetensors CUDA baseline:

Variant	Top-1 Ordering	Max Absolute Score Drift
Q8_0	Preserved on all validation cases	0.0029
Q4_K	Preserved on all validation cases	0.1261

Q4_K should be treated as a ranking-oriented compact artifact. Downstream systems that threshold absolute reranker scores should prefer model.safetensors or Q8_0.

Limitations

This is an English-focused reranker inherited from the upstream Mixedbread model.
It scores query/document pairs independently and is intended as a second-stage ranker, not as a standalone document index.
Quantized GGUF files can change absolute scores; downstream systems should prefer rank/order checks over exact-score equality.

Source Model

This package is based on mixedbread-ai/mxbai-rerank-base-v1.

Citation

If you use this model, cite the upstream Mixedbread reranker work:

@online{rerank2024mxbai,
  title={Boost Your Search With The Crispy Mixedbread Rerank Models},
  author={Aamir Shakir and Darius Koenig and Julius Lipp and Sean Lee},
  year={2024},
  url={https://www.mixedbread.ai/blog/mxbai-rerank-v1},
}

License

Apache 2.0. See LICENSE.

Downloads last month: 83

Safetensors

Model size

0.2B params

Tensor type

F16

Inference Providers NEW

Text Ranking

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for antflydb/mxbai-rerank-base-v1

Base model

mixedbread-ai/mxbai-rerank-base-v1

Quantized

(3)

this model