SearchLM NL2BM25 — SFT v1 (Qwen2.5-3B-Instruct)

Part of the SearchLM collection · GitHub

A Qwen2.5-3B-Instruct model fine-tuned via LoRA SFT to convert natural language queries into Tantivy boolean search queries with explicit chain-of-thought reasoning. This is the warm-start checkpoint before GRPO reinforcement learning.

Pipeline position: base → SFT v1 → GRPO v1 (⚠️ reward hacking) → SFT v2 → GRPO v2 ✅

For the best retrieval model, use GRPO v2.


What it does

The model outputs a structured two-part response for any natural language information need:

Input:

Do statins cause breast cancer?

Output:

<reasoning>
Key concepts:
1. Statin drugs — synonyms: statin, HMG-CoA reductase inhibitor, simvastatin, atorvastatin,
   lovastatin, pravastatin, rosuvastatin
2. Causal relationship — cause, risk, association, induce, increase risk
3. Breast cancer — "breast cancer", "breast carcinoma", "breast neoplasm", "mammary carcinoma"

Strategy: AND the three concept groups; OR synonyms within each group.
Phrase-quote multi-word terms to prevent term splitting.
</reasoning>
<query>(statin OR "HMG-CoA reductase inhibitor" OR simvastatin OR atorvastatin OR lovastatin)
AND (cause OR risk OR association OR "induce" OR "increase risk")
AND ("breast cancer" OR "breast carcinoma" OR "breast neoplasm")</query>

The <query> block is valid Tantivy boolean syntax ready to pass directly to a search engine.


All SearchLM checkpoints

Model NFCorpus NDCG@10 SciFact NDCG@10 Mean tokens Boolean ops
base (Qwen2.5-3B-Instruct) 0.455 0.386 120 ~20%
SFT v1 0.441 0.273 95 ~80%
GRPO v1 ⚠️ 0.556 0.608 5–7 0%
SFT v2 0.466 0.358 109 ~65%
GRPO v2 0.577 0.657 147 ~35%

Evaluated on BEIR test splits (NFCorpus: 323 queries, SciFact: 300 queries).

SFT v1 scores slightly below base on NFCorpus and well below on SciFact. The ~36% of training examples with ndcg_at_10 = 0 taught syntactically correct but semantically wrong boolean structure — queries that parsed fine but retrieved nothing. SFT v2 fixes this with a quality filter.


Training Details

Setting Value
Base model Qwen/Qwen2.5-3B-Instruct
Method LoRA SFT (r=16, α=32), adapter merged into base
Target modules q/k/v/o projections + gate/up/down projections
Training data Supreeth/nl2bm25-sft — 4,999 examples
Source BEIR datasets NFCorpus, SciFact, FiQA-2018, ArguAna, HotpotQA, NQ
Data generation GPT-4o / Llama-3.3-70B / Qwen2.5-72B cycling via NVIDIA NIM
Epochs 1
Learning rate 2e-4 (cosine decay, 5% warmup)
Effective batch size 16 (2 × 8 grad accum)
Max sequence length 1,024 tokens
Hardware NVIDIA A10G 24 GB
Training time ~30 min
Final loss ~0.23
Token accuracy ~94%
W&B run supreethrao/searchlm

Training data distribution

Source dataset Queries Doc count
NFCorpus ~700 3,633
SciFact ~500 5,183
FiQA-2018 ~1,600 57,638
ArguAna ~800 8,674
HotpotQA ~800 5,233,329
NQ ~599 2,681,468

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Supreeth/searchlm-nl2bm25-sft",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Supreeth/searchlm-nl2bm25-sft")

SYSTEM_PROMPT = """You are an expert information retrieval specialist. Convert the \
natural language query into a Tantivy boolean search query.

Output format (strictly follow this):
<reasoning>
Step-by-step concept extraction and synonym expansion.
</reasoning>
<query>your boolean query here</query>"""

nl_query = "effects of climate change on coral reef ecosystems"
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": f"Convert to a Tantivy boolean search query:\n\n{nl_query}"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Tantivy Boolean Syntax

Tantivy is a full-text search engine library. The model targets its query language:

Construct Syntax Example
Single term word cancer
Exact phrase "phrase" "bone density"
AND A AND B vitamin AND calcium
OR A OR B cancer OR tumor OR malignancy
NOT NOT A NOT review
Grouping (A OR B) (cat OR feline) AND behavior
Field scope field:term title:"machine learning"
Boost term^N cancer^2 OR tumor

Related resources

Citation

@misc{searchlm2026,
  title  = {SearchLM: Training Small Language Models for Boolean Query Generation via RLVR},
  author = {Rao, Supreeth},
  year   = {2026},
  url    = {https://github.com/SupreethRao99/searchLM},
}
Downloads last month
62
Safetensors
Model size
3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Supreeth/searchlm-nl2bm25-sft

Base model

Qwen/Qwen2.5-3B
Finetuned
(1387)
this model

Collection including Supreeth/searchlm-nl2bm25-sft