Instructions to use Supreeth/searchlm-nl2bm25-sft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Supreeth/searchlm-nl2bm25-sft with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Supreeth/searchlm-nl2bm25-sft")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Supreeth/searchlm-nl2bm25-sft")
model = AutoModelForCausalLM.from_pretrained("Supreeth/searchlm-nl2bm25-sft")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Supreeth/searchlm-nl2bm25-sft with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Supreeth/searchlm-nl2bm25-sft"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Supreeth/searchlm-nl2bm25-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Supreeth/searchlm-nl2bm25-sft

SGLang

How to use Supreeth/searchlm-nl2bm25-sft with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Supreeth/searchlm-nl2bm25-sft" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Supreeth/searchlm-nl2bm25-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Supreeth/searchlm-nl2bm25-sft" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Supreeth/searchlm-nl2bm25-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Supreeth/searchlm-nl2bm25-sft with Docker Model Runner:
```
docker model run hf.co/Supreeth/searchlm-nl2bm25-sft
```

SearchLM NL2BM25 — SFT v1 (Qwen2.5-3B-Instruct)

Part of the SearchLM collection · GitHub

A Qwen2.5-3B-Instruct model fine-tuned via LoRA SFT to convert natural language queries into Tantivy boolean search queries with explicit chain-of-thought reasoning. This is the warm-start checkpoint before GRPO reinforcement learning.

Pipeline position: base → SFT v1 → GRPO v1 (⚠️ reward hacking) → SFT v2 → GRPO v2 ✅

For the best retrieval model, use GRPO v2.

What it does

The model outputs a structured two-part response for any natural language information need:

Input:

Do statins cause breast cancer?

Output:

<reasoning>
Key concepts:
1. Statin drugs — synonyms: statin, HMG-CoA reductase inhibitor, simvastatin, atorvastatin,
   lovastatin, pravastatin, rosuvastatin
2. Causal relationship — cause, risk, association, induce, increase risk
3. Breast cancer — "breast cancer", "breast carcinoma", "breast neoplasm", "mammary carcinoma"

Strategy: AND the three concept groups; OR synonyms within each group.
Phrase-quote multi-word terms to prevent term splitting.
</reasoning>
<query>(statin OR "HMG-CoA reductase inhibitor" OR simvastatin OR atorvastatin OR lovastatin)
AND (cause OR risk OR association OR "induce" OR "increase risk")
AND ("breast cancer" OR "breast carcinoma" OR "breast neoplasm")</query>

The <query> block is valid Tantivy boolean syntax ready to pass directly to a search engine.

All SearchLM checkpoints

Model	NFCorpus NDCG@10	SciFact NDCG@10	Mean tokens	Boolean ops
base (Qwen2.5-3B-Instruct)	0.455	0.386	120	~20%
SFT v1	0.441	0.273	95	~80%
GRPO v1 ⚠️	0.556	0.608	5–7	0%
SFT v2	0.466	0.358	109	~65%
GRPO v2 ✅	0.577	0.657	147	~35%

Evaluated on BEIR test splits (NFCorpus: 323 queries, SciFact: 300 queries).

SFT v1 scores slightly below base on NFCorpus and well below on SciFact. The ~36% of training examples with ndcg_at_10 = 0 taught syntactically correct but semantically wrong boolean structure — queries that parsed fine but retrieved nothing. SFT v2 fixes this with a quality filter.

Training Details

Setting	Value
Base model	`Qwen/Qwen2.5-3B-Instruct`
Method	LoRA SFT (r=16, α=32), adapter merged into base
Target modules	q/k/v/o projections + gate/up/down projections
Training data	Supreeth/nl2bm25-sft — 4,999 examples
Source BEIR datasets	NFCorpus, SciFact, FiQA-2018, ArguAna, HotpotQA, NQ
Data generation	GPT-4o / Llama-3.3-70B / Qwen2.5-72B cycling via NVIDIA NIM
Epochs	1
Learning rate	2e-4 (cosine decay, 5% warmup)
Effective batch size	16 (2 × 8 grad accum)
Max sequence length	1,024 tokens
Hardware	NVIDIA A10G 24 GB
Training time	~30 min
Final loss	~0.23
Token accuracy	~94%
W&B run	`supreethrao/searchlm`

Training data distribution

Source dataset	Queries	Doc count
NFCorpus	~700	3,633
SciFact	~500	5,183
FiQA-2018	~1,600	57,638
ArguAna	~800	8,674
HotpotQA	~800	5,233,329
NQ	~599	2,681,468

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Supreeth/searchlm-nl2bm25-sft",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Supreeth/searchlm-nl2bm25-sft")

SYSTEM_PROMPT = """You are an expert information retrieval specialist. Convert the \
natural language query into a Tantivy boolean search query.

Output format (strictly follow this):
<reasoning>
Step-by-step concept extraction and synonym expansion.
</reasoning>
<query>your boolean query here</query>"""

nl_query = "effects of climate change on coral reef ecosystems"
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": f"Convert to a Tantivy boolean search query:\n\n{nl_query}"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Tantivy Boolean Syntax

Tantivy is a full-text search engine library. The model targets its query language:

Construct	Syntax	Example
Single term	`word`	`cancer`
Exact phrase	`"phrase"`	`"bone density"`
AND	`A AND B`	`vitamin AND calcium`
OR	`A OR B`	`cancer OR tumor OR malignancy`
NOT	`NOT A`	`NOT review`
Grouping	`(A OR B)`	`(cat OR feline) AND behavior`
Field scope	`field:term`	`title:"machine learning"`
Boost	`term^N`	`cancer^2 OR tumor`

Related resources

Dataset: Supreeth/nl2bm25-sft
Code: SupreethRao99/searchLM
Analysis: Reward hacking report
Collection: SearchLM collection

Citation

@misc{searchlm2026,
  title  = {SearchLM: Training Small Language Models for Boolean Query Generation via RLVR},
  author = {Rao, Supreeth},
  year   = {2026},
  url    = {https://github.com/SupreethRao99/searchLM},
}