baa.ai · Merino-Nano

One model that does both halves of RAG retrieval — bi-encoder embedding and cross-encoder reranking — over a single shared word-embedding table. A 384-dimensional English model, ~34M parameters, by BAA AI (Black Sheep AI).

Get the optimal model for your data

Merino-Nano is a strong, cost-efficient default. But the best embedder + reranker is corpus-specific — the ideal choice depends on your documents and your notion of relevance. baa.ai offers exclusive tooling that identifies the optimal embedding and reranking models for your specific data, so you ship the smallest models that maximize document recovery on your corpus. For a tailored recommendation, reach out to baa.ai.

What it is

A two-role retrieval model over a shared input word-embedding matrix (stored once). The bi-encoder embedder and a compact cross-encoder reranker are built on the same MiniLM-L6-H384-uncased backbone, so their word-embedding table is stored a single time and injected into the reranker at load — a smaller download at no measured quality loss, with no retraining.

  • Embed role: bi-encoder, 384-d, L2-normalized.
  • Rerank role: cross-encoder, single relevance logit per (query, document) pair.
  • Router: call .embed(...) or .rerank(...).

Usage

from modeling_baa import BaaEmbeddingReranker   # included in this repo

m = BaaEmbeddingReranker("baa-ai/Merino-Nano")
qv = m.embed(["how does a cross-encoder reranker work?"], is_query=True)[0]
dv = m.embed(["a cross-encoder scores a (query, document) pair jointly",
              "bi-encoders embed query and document separately for fast retrieval"])
ranked = m.rerank("how does a cross-encoder reranker work?",
                  ["a cross-encoder scores a (query, document) pair jointly",
                   "the mitochondria is the powerhouse of the cell"])
# -> [(doc, score), ...] sorted best-first

Specs

Embedding dim 384
Parameters ~34M (embedder + reranker, shared word-embedding table)
Languages English
Max sequence length 512
Hardware CPU / edge / GPU

License & attribution

  • BAA Contributions (shared-embedding architecture, router/loader code, packaging, weights, docs) are proprietary to BAA AI (Black Sheep AI) — see LICENSE.
  • Incorporates the MiniLM-L6-H384-uncased backbone under the MIT License — see LICENSE-minilm.txt.

© 2026 BAA AI (Black Sheep AI) — baa.ai. Provided "as is" without warranty.

Certification & corpus fit (2026-07)

Position Balance (PB): 0.16 — PB measures how findable a chunk is through its second fact when two facts share one embedding (second-fact / first-fact top-1 retrieval on an adversarial 1,300-chunk audit; fleet range 0.16–0.69). Compact deployment tier: use with strictly atomic chunking (one claim per embedded chunk) and parent-document retrieval.

4-bit quantization: certified lossless under distractor stress. Paired contested-region robustness (gold document injected into pools of up to 100 near-topical distractors, n=300 queries, bootstrap CIs) is statistically indistinguishable from fp16 — an axis standard hit@k benchmarks do not measure.

Chunking prescription: embed one atomic claim per chunk and lead with its key entity; retrieve small, return the parent section for context. Basis: single-vector embeddings preserve ~one independent fact per chunk regardless of encoder family (measured across 12 encoders).

Downloads last month
25
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including baa-ai/Merino-Nano