Instructions to use Supreeth/searchlm-nl2bm25-sft-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Supreeth/searchlm-nl2bm25-sft-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Supreeth/searchlm-nl2bm25-sft-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Supreeth/searchlm-nl2bm25-sft-v2")
model = AutoModelForCausalLM.from_pretrained("Supreeth/searchlm-nl2bm25-sft-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Supreeth/searchlm-nl2bm25-sft-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Supreeth/searchlm-nl2bm25-sft-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Supreeth/searchlm-nl2bm25-sft-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Supreeth/searchlm-nl2bm25-sft-v2

SGLang

How to use Supreeth/searchlm-nl2bm25-sft-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Supreeth/searchlm-nl2bm25-sft-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Supreeth/searchlm-nl2bm25-sft-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Supreeth/searchlm-nl2bm25-sft-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Supreeth/searchlm-nl2bm25-sft-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Supreeth/searchlm-nl2bm25-sft-v2 with Docker Model Runner:
```
docker model run hf.co/Supreeth/searchlm-nl2bm25-sft-v2
```

SearchLM NL2BM25 — SFT v2 Quality-Filtered (Qwen2.5-3B-Instruct)

Part of the SearchLM collection · GitHub

A quality-filtered LoRA SFT warm-start. v2 keeps only training examples where the LLM-generated boolean query actually retrieved at least one relevant document (ndcg_at_10 > 0), eliminating the ~65% of v1's data that taught syntactically correct but semantically useless boolean structure.

This is the base model for GRPO v2, the best-performing SearchLM checkpoint.

Pipeline position: base → SFT v1 → GRPO v1 (⚠️) → SFT v2 → GRPO v2 ✅

Why quality filtering matters

SFT v1 trained on 4,999 examples, ~36% of which had ndcg_at_10 = 0. These examples taught the model to produce complex-looking queries that simply didn't retrieve anything. SciFact was hit hardest: SFT v1 dropped below base (0.273 vs 0.386) because scientific terminology requires precision — over-specified AND chains returned nothing.

Before (SFT v1 — query returns zero results):

<query>("ALDH1" OR "aldehyde dehydrogenase 1" OR "ALDH1A1")
AND ("breast cancer" OR "mammary carcinoma" OR "breast neoplasm")
AND (expression OR "gene expression" OR overexpression)
AND (outcome OR prognosis OR survival OR "disease-free survival")
AND (better OR improved OR favorable OR positive)</query>

After (SFT v2 — learned from working examples only):

<query>("ALDH1" OR "aldehyde dehydrogenase 1")
AND ("breast cancer" OR "breast neoplasm")
AND (expression OR overexpression)
AND (outcome OR prognosis OR survival)</query>

Fewer AND clauses → Tantivy returns documents → model receives training signal.

All SearchLM checkpoints

Model	NFCorpus NDCG@10	SciFact NDCG@10	Mean tokens	Boolean ops
base (Qwen2.5-3B-Instruct)	0.455	0.386	120	~20%
SFT v1	0.441	0.273	95	~80%
GRPO v1 ⚠️	0.556	0.608	5–7	0%
SFT v2	0.466	0.358	109	~65%
GRPO v2 ✅	0.577	0.657	147	~35%

Evaluated on BEIR test splits (NFCorpus: 323 queries, SciFact: 300 queries).

SFT v1 vs SFT v2

	SFT v1	SFT v2
Training examples	4,999	1,751 (35% of v1)
Quality filter	all syntax-valid	`ndcg_at_10 > 0`
NFCorpus NDCG@10	0.441	0.466 (+0.025)
SciFact NDCG@10	0.273	0.358 (+0.085)
Training time (A10G)	~30 min	~22 min
Final loss	~0.23	~0.24

SciFact gained the most (+0.085) because it's where over-specification hurts most — precise scientific documents retrieved by narrow terminology demand tighter query formulation.

Training Details

Setting	Value
Base model	`Qwen/Qwen2.5-3B-Instruct`
Method	LoRA SFT (r=16, α=32), adapter merged into base
Target modules	q/k/v/o projections + gate/up/down projections
Training data	Supreeth/nl2bm25-sft filtered: `ndcg_at_10 > 0`
Retained / total	1,751 / 4,999 (35%)
Epochs	1
Learning rate	2e-4 (cosine decay, 5% warmup)
Effective batch size	16 (2 × 8 grad accum)
Max sequence length	1,024 tokens
Hardware	NVIDIA A10G 24 GB
Training time	~22 min
Final loss	~0.24
Token accuracy	~93.8%
W&B run	`supreethrao/searchlm/runs/k00s9ype`

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Supreeth/searchlm-nl2bm25-sft-v2",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Supreeth/searchlm-nl2bm25-sft-v2")

SYSTEM_PROMPT = """You are an expert information retrieval specialist. Convert the \
natural language query into a Tantivy boolean search query.

Output format (strictly follow this):
<reasoning>
Step-by-step concept extraction and synonym expansion.
</reasoning>
<query>your boolean query here</query>"""

nl_query = "effects of climate change on coral reef ecosystems"
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": f"Convert to a Tantivy boolean search query:\n\n{nl_query}"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Tantivy Boolean Syntax

Tantivy is a full-text search engine library. The model targets its query language:

Construct	Syntax	Example
Single term	`word`	`cancer`
Exact phrase	`"phrase"`	`"bone density"`
AND	`A AND B`	`vitamin AND calcium`
OR	`A OR B`	`cancer OR tumor OR malignancy`
NOT	`NOT A`	`NOT review`
Grouping	`(A OR B)`	`(cat OR feline) AND behavior`
Field scope	`field:term`	`title:"machine learning"`
Boost	`term^N`	`cancer^2 OR tumor`

Related resources

Dataset: Supreeth/nl2bm25-sft
Next step: GRPO v2 — reinforcement learning from this checkpoint
Code: SupreethRao99/searchLM
Analysis: Reward hacking report
Collection: SearchLM collection

Citation

@misc{searchlm2026,
  title  = {SearchLM: Training Small Language Models for Boolean Query Generation via RLVR},
  author = {Rao, Supreeth},
  year   = {2026},
  url    = {https://github.com/SupreethRao99/searchLM},
}