mxbai-rerank-base-v1 for Antfly Inference

This repository packages Mixedbread's mxbai-rerank-base-v1 reranker for Antfly Inference deployments. It includes the original safetensors checkpoint plus Antfly Inference GGUF variants for local CPU/GPU inference.

The model is a cross-encoder reranker: given a query and a set of candidate documents, it scores each query/document pair so retrieval systems can re-order candidates after lexical or embedding search.

Files

File Purpose
model.safetensors Original fp32/fp16-compatible Transformers checkpoint
mxbai-rerank-base-v1.Q8_0.gguf Higher-fidelity GGUF quantization
mxbai-rerank-base-v1.Q4_K.gguf Smaller GGUF quantization for lower memory use
config.json Model architecture/configuration
tokenizer.json, spm.model, tokenizer sidecars Tokenization assets
model_manifest.json Antfly Inference model capability manifest
antfly_inference_variants.json Antfly Inference GGUF variant index

Intended Uses

  • Reranking search results from BM25, vector search, or hybrid retrieval
  • Improving top-k precision in RAG pipelines
  • Local reranking in Antfly Inference services
  • Offline evaluation of reranking quality and quantization drift

How to Use with Antfly Inference

antfly inference pull antflydb/mxbai-rerank-base-v1:gguf:Q8_0
antfly inference run

Use :gguf:Q4_K instead for the smaller rank-oriented artifact.

curl -X POST http://localhost:8082/rerank \
  -H "Content-Type: application/json" \
  -d '{
    "model": "antflydb/mxbai-rerank-base-v1",
    "query": "Who wrote To Kill a Mockingbird?",
    "documents": [
      "To Kill a Mockingbird is a novel by Harper Lee.",
      "Moby-Dick was written by Herman Melville.",
      "Jane Austen wrote Pride and Prejudice."
    ]
  }'

Quantized Variants

The GGUF files were exported with Antfly's inference exporter from the mixedbread-ai/mxbai-rerank-base-v1 safetensors source.

Variant File Size Notes
Q8_0 mxbai-rerank-base-v1.Q8_0.gguf ~278 MB Better score fidelity; DeBERTa word embeddings remain dense F16 for CUDA compatibility
Q4_K mxbai-rerank-base-v1.Q4_K.gguf ~103 MB Smaller footprint; rank-validated, with larger absolute score drift than Q8_0

Validation

Validated locally with Antfly CUDA rerank on:

  • Basic relevance ranking
  • Empty-document handling
  • Long-input truncation
  • Multi-document ordering

Results versus the safetensors CUDA baseline:

Variant Top-1 Ordering Max Absolute Score Drift
Q8_0 Preserved on all validation cases 0.0029
Q4_K Preserved on all validation cases 0.1261

Q4_K should be treated as a ranking-oriented compact artifact. Downstream systems that threshold absolute reranker scores should prefer model.safetensors or Q8_0.

Limitations

  • This is an English-focused reranker inherited from the upstream Mixedbread model.
  • It scores query/document pairs independently and is intended as a second-stage ranker, not as a standalone document index.
  • Quantized GGUF files can change absolute scores; downstream systems should prefer rank/order checks over exact-score equality.

Source Model

This package is based on mixedbread-ai/mxbai-rerank-base-v1.

Citation

If you use this model, cite the upstream Mixedbread reranker work:

@online{rerank2024mxbai,
  title={Boost Your Search With The Crispy Mixedbread Rerank Models},
  author={Aamir Shakir and Darius Koenig and Julius Lipp and Sean Lee},
  year={2024},
  url={https://www.mixedbread.ai/blog/mxbai-rerank-v1},
}

License

Apache 2.0. See LICENSE.

Downloads last month
83
Safetensors
Model size
0.2B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for antflydb/mxbai-rerank-base-v1

Quantized
(3)
this model