Instructions to use Phani-labs/Slonik-7B-GRPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Phani-labs/Slonik-7B-GRPO with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Phani-labs/Slonik-7B-GRPO")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Phani-labs/Slonik-7B-GRPO")
model = AutoModelForCausalLM.from_pretrained("Phani-labs/Slonik-7B-GRPO")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Phani-labs/Slonik-7B-GRPO with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Phani-labs/Slonik-7B-GRPO"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Phani-labs/Slonik-7B-GRPO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Phani-labs/Slonik-7B-GRPO

SGLang

How to use Phani-labs/Slonik-7B-GRPO with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Phani-labs/Slonik-7B-GRPO" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Phani-labs/Slonik-7B-GRPO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Phani-labs/Slonik-7B-GRPO" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Phani-labs/Slonik-7B-GRPO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use Phani-labs/Slonik-7B-GRPO with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Phani-labs/Slonik-7B-GRPO to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Phani-labs/Slonik-7B-GRPO to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Phani-labs/Slonik-7B-GRPO to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Phani-labs/Slonik-7B-GRPO",
    max_seq_length=2048,
)

Docker Model Runner
How to use Phani-labs/Slonik-7B-GRPO with Docker Model Runner:
```
docker model run hf.co/Phani-labs/Slonik-7B-GRPO
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Slonik-7B-GRPO

A PostgreSQL and SQLite text-to-SQL model, fine-tuned from Qwen2.5-Coder-7B-Instruct via QLoRA SFT followed by 2000-step GRPO with execution-based rewards.

Related repos:

Phani-labs/Slonik-7B-SFT — the SFT-only baseline (33.20% BIRD-PG)
Phani-labs/Slonik-7B-GRPO-GGUF — GGUF quantizations for Ollama / llama.cpp / LM Studio

Why I built this

I wanted a small text-to-SQL model that handled real PostgreSQL and SQLite questions — JSONB access, pgvector similarity, full-text search, window functions, deep CTEs — and was small enough to run locally. SFT alone got partway there (33.20% on BIRD-PG, already past most open 7B and 32B baselines), but the model still produced syntactically clean queries that referenced columns the schema didn't have. That's the pattern execution-based RL is built to fix.

This is the GRPO version, with 5 points more accuracy on BIRD-PG and a clear pattern of improvement on dialect-specific issues.

Results

BIRD Mini-Dev (500-example official benchmarks, execution accuracy):

Model	BIRD-PG	BIRD-SQLite	Size
o3-mini	47.78%	—	reasoning
Claude 3.7 Sonnet	39.26%	—	proprietary
Slonik-7B-GRPO (this)	38.20%	45.20%	7B
GPT-4o	34.44%	—	proprietary
Slonik-7B-SFT (sibling)	33.20%	—	7B
Qwen2.5-Coder-32B	22.96%	—	32B
Codestral 22B	21.11%	—	22B
Qwen2.5-Coder-7B (base)	12.22%	—	7B

Performance by difficulty

Tier	BIRD-PG	BIRD-SQLite
Simple	56.1%	66.2%
Moderate	33.6%	38.0%
Challenging	23.5%	32.4%

SFT → GRPO trajectory on BIRD-PG

Stage	Overall	Simple	Moderate	Challenging
Base Qwen2.5-Coder-7B	12.22%	—	—	—
Slonik-7B-SFT	33.20%	48.6%	29.6%	19.6%
Slonik-7B-GRPO (500 steps)	34.60%	49.3%	31.2%	21.6%
Slonik-7B-GRPO (2000 steps)	38.20%	56.1%	33.6%	23.5%

Largest absolute gains were on simple (+7.5 pts vs SFT) and moderate (+4.0 pts). Hardest tier moved less, which lines up with what 7B models can do given short context budgets.

Training

Two stages on a single RTX 5080 Laptop GPU (16 GB VRAM, Blackwell sm_120). Total external cost about $3 (DeepSeek API for synthetic data generation).

Stage 1 — QLoRA SFT (8h 13min)

QLoRA fine-tune of Qwen2.5-Coder-7B-Instruct on 21,847 text-to-SQL pairs:

BIRD-SQL train split — 6,601 examples
Spider train split — 8,034 examples
Gretel synthetic text-to-SQL PostgreSQL subset — 5,212 examples
PG-Modern custom synthesis — 2,000 examples covering pgvector, JSONB, full-text search, CTEs, window functions, and array operations

LoRA rank 32, alpha 64, 4-bit NF4 base, LR 1e-5, max_grad_norm 0.5, adamw_torch_fused. Final eval_loss 0.290.

Stage 2 — GRPO with execution rewards (16h)

GRPO with three reward signals: weighted execution match against BIRD SQLite databases (1.0), syntax validity via sqlglot (0.2), and code-fence formatting (0.1). 2000 steps total, num_generations=2, LR 5e-6.

The 16-hour wall time is from disabling vLLM rollouts (the available vLLM wheels are built for CUDA 13 and don't load on my CUDA 12.8 Blackwell driver). With vLLM, the same 2000 steps would have taken closer to 2–3 hours.

What GRPO actually fixed

Looking at the 500 BIRD-PG examples, GRPO fixed 12 queries that SFT got wrong and broke 6 that SFT had right — net +6, plus the broader trend of better dialect awareness.

The biggest improvement was dialect awareness. SFT kept generating MONTH(date) — that's MySQL syntax and just fails on Postgres. GRPO learned EXTRACT(MONTH FROM date) from the executions that came back as errors.

It also got better at date formats. SFT was guessing patterns like LIKE '%/%/87%' (assuming mm/dd/yy), which returned empty result sets against dates stored as YYYY-MM-DD. GRPO settled on LIKE '%1987%' after enough wrong-answer signals.

A smaller but interesting one: it learned when not to quote identifiers. SFT was over-quoting in cases where the DDL was unquoted, which broke case-sensitive matches.

Limitations

This is not a general SQL assistant for every dialect — it's tuned around PostgreSQL and SQLite specifically. Behavior on MySQL or SQL Server isn't validated.

The 7B size still shows up on harder examples. Challenging-tier BIRD-PG accuracy is 23.5%, and schema grounding is imperfect on tables with 30+ columns, where most remaining errors are hallucinated column names. My guess is that's a 7B context-handling limitation more than a training-data issue.

GRPO has its own failure mode I observed in the eval comparison: it occasionally over-quotes identifiers or adds unnecessary DISTINCT clauses. The 6 regressions across 500 BIRD-PG examples (against the SFT baseline) come from this pattern. The net gain was still positive, but it's one weakness of binary execution rewards — the model can't always distinguish between "succeeded because of better grounding" and "succeeded because of incidental stylistic choices in the rollout."

Notes for Blackwell laptops

On RTX 5080 / sm_120, vLLM CUDA 13 wheels didn't load on the CUDA 12.x runtime, so both stages trained through Unsloth's Triton fallback (no flash-attn, no nvcc). AdamW 8-bit produced NaNs within the first 100 SFT steps every time; adamw_torch_fused with LR 1e-5 and grad clipping at 0.5 stabilized SFT. For GRPO, the key stability fix was catching every exception type from sqlglot in the reward function — a TokenError from an unterminated string literal in one rollout crashed the run at step 320 the first time around.