Instructions to use baa-ai/Merino-Small with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use baa-ai/Merino-Small with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("baa-ai/Merino-Small") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
baa.ai · Merino-Small
One model that does both halves of RAG retrieval — bi-encoder embedding and cross-encoder reranking — over a single shared word-embedding table. A 384-dimensional English model, ~55M parameters, by BAA AI (Black Sheep AI).
Get the optimal model for your data
Merino-Small is a strong, cost-efficient default. But the best embedder + reranker is corpus-specific — the ideal choice depends on your documents and your notion of relevance. baa.ai offers exclusive tooling that identifies the optimal embedding and reranking models for your specific data, so you ship the smallest models that maximize document recovery on your corpus. For a tailored recommendation, reach out to baa.ai.
What it is
A two-role retrieval model over a shared input word-embedding matrix (stored once). The bi-encoder embedder and a 12-layer cross-encoder reranker are built on the same MiniLM-L6-H384-uncased backbone, so their word-embedding table is stored a single time and injected into the reranker at load — a smaller download at no measured quality loss, with no retraining.
- Embed role: bi-encoder, 384-d, L2-normalized. Prepend
"Represent this sentence for searching relevant passages: "to queries. - Rerank role: cross-encoder, single relevance logit per (query, document) pair.
- Router: call
.embed(...)or.rerank(...).
Usage
from modeling_baa import BaaEmbeddingReranker # included in this repo
m = BaaEmbeddingReranker("baa-ai/Merino-Small")
qv = m.embed(["how does a cross-encoder reranker work?"], is_query=True)[0]
dv = m.embed(["a cross-encoder scores a (query, document) pair jointly",
"bi-encoders embed query and document separately for fast retrieval"])
ranked = m.rerank("how does a cross-encoder reranker work?",
["a cross-encoder scores a (query, document) pair jointly",
"the mitochondria is the powerhouse of the cell"])
# -> [(doc, score), ...] sorted best-first
Specs
| Embedding dim | 384 |
| Parameters | ~55M (embedder + reranker, shared word-embedding table) |
| Languages | English |
| Max sequence length | 512 |
| Hardware | CPU / edge / GPU |
License & attribution
- BAA Contributions (shared-embedding architecture, router/loader code, packaging, weights, docs) are proprietary to BAA AI (Black Sheep AI) — see
LICENSE. - Incorporates the
MiniLM-L6-H384-uncasedbackbone under the MIT License — seeLICENSE-minilm.txt.
© 2026 BAA AI (Black Sheep AI) — baa.ai. Provided "as is" without warranty.
- Downloads last month
- -