devdata-search-noinstruct-small-asym-cmnrl

A bi-encoder embedding model for search over structured statistical metadata, part of the DevData Search family. It is a fine-tune of avsolatorio/NoInstruct-small-Embedding-v0 produced with schema-invariant fine-tuning on DevDataBench: full-schema serialization with per-example field-order permutation and field dropout, so the encoder binds meaning to field labels rather than to serialization order. This is an embedding model that powers retrieval; it is not a hosted search service.

See the paper Field Order Should Not Matter: Permutation-Invariant Fine-Tuning for Structured Metadata Retrieval.

Training

  • Base model: avsolatorio/NoInstruct-small-Embedding-v0
  • Loss: cmnrl
  • Field permutation: True; field dropout: 0.15
  • Max sequence length: 512
  • No query/document prefixes

Usage

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("ai4data/devdata-search-noinstruct-small-asym-cmnrl")
queries = ["mobile-broadband subscriptions per 100 people, reported annually"]
docs = ["name: Active mobile-broadband subscriptions | ..."]
q = model.encode(queries)
d = model.encode(docs)

Cosine similarity of q and d ranks documents for each query.

License

Apache-2.0. Derived from avsolatorio/NoInstruct-small-Embedding-v0; trained on public World Bank Data360 metadata.

Downloads last month
20
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ai4data/devdata-search-noinstruct-small-asym-cmnrl

Finetuned
(2)
this model

Dataset used to train ai4data/devdata-search-noinstruct-small-asym-cmnrl