Instructions to use Phani-labs/Slonik-7B-GRPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Phani-labs/Slonik-7B-GRPO with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Phani-labs/Slonik-7B-GRPO") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Phani-labs/Slonik-7B-GRPO") model = AutoModelForCausalLM.from_pretrained("Phani-labs/Slonik-7B-GRPO") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Phani-labs/Slonik-7B-GRPO with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Phani-labs/Slonik-7B-GRPO" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Phani-labs/Slonik-7B-GRPO", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Phani-labs/Slonik-7B-GRPO
- SGLang
How to use Phani-labs/Slonik-7B-GRPO with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Phani-labs/Slonik-7B-GRPO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Phani-labs/Slonik-7B-GRPO", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Phani-labs/Slonik-7B-GRPO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Phani-labs/Slonik-7B-GRPO", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use Phani-labs/Slonik-7B-GRPO with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Phani-labs/Slonik-7B-GRPO to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Phani-labs/Slonik-7B-GRPO to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Phani-labs/Slonik-7B-GRPO to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Phani-labs/Slonik-7B-GRPO", max_seq_length=2048, ) - Docker Model Runner
How to use Phani-labs/Slonik-7B-GRPO with Docker Model Runner:
docker model run hf.co/Phani-labs/Slonik-7B-GRPO
Slonik-7B-GRPO
A PostgreSQL and SQLite text-to-SQL model, fine-tuned from Qwen2.5-Coder-7B-Instruct via QLoRA SFT followed by 2000-step GRPO with execution-based rewards.
Related repos:
- Phani-labs/Slonik-7B-SFT — the SFT-only baseline (33.20% BIRD-PG)
- Phani-labs/Slonik-7B-GRPO-GGUF — GGUF quantizations for Ollama / llama.cpp / LM Studio
Why I built this
I wanted a small text-to-SQL model that handled real PostgreSQL and SQLite questions — JSONB access, pgvector similarity, full-text search, window functions, deep CTEs — and was small enough to run locally. SFT alone got partway there (33.20% on BIRD-PG, already past most open 7B and 32B baselines), but the model still produced syntactically clean queries that referenced columns the schema didn't have. That's the pattern execution-based RL is built to fix.
This is the GRPO version, with 5 points more accuracy on BIRD-PG and a clear pattern of improvement on dialect-specific issues.
Results
BIRD Mini-Dev (500-example official benchmarks, execution accuracy):
| Model | BIRD-PG | BIRD-SQLite | Size |
|---|---|---|---|
| o3-mini | 47.78% | — | reasoning |
| Claude 3.7 Sonnet | 39.26% | — | proprietary |
| Slonik-7B-GRPO (this) | 38.20% | 45.20% | 7B |
| GPT-4o | 34.44% | — | proprietary |
| Slonik-7B-SFT (sibling) | 33.20% | — | 7B |
| Qwen2.5-Coder-32B | 22.96% | — | 32B |
| Codestral 22B | 21.11% | — | 22B |
| Qwen2.5-Coder-7B (base) | 12.22% | — | 7B |
Performance by difficulty
| Tier | BIRD-PG | BIRD-SQLite |
|---|---|---|
| Simple | 56.1% | 66.2% |
| Moderate | 33.6% | 38.0% |
| Challenging | 23.5% | 32.4% |
SFT → GRPO trajectory on BIRD-PG
| Stage | Overall | Simple | Moderate | Challenging |
|---|---|---|---|---|
| Base Qwen2.5-Coder-7B | 12.22% | — | — | — |
| Slonik-7B-SFT | 33.20% | 48.6% | 29.6% | 19.6% |
| Slonik-7B-GRPO (500 steps) | 34.60% | 49.3% | 31.2% | 21.6% |
| Slonik-7B-GRPO (2000 steps) | 38.20% | 56.1% | 33.6% | 23.5% |
Largest absolute gains were on simple (+7.5 pts vs SFT) and moderate (+4.0 pts). Hardest tier moved less, which lines up with what 7B models can do given short context budgets.
Training
Two stages on a single RTX 5080 Laptop GPU (16 GB VRAM, Blackwell sm_120). Total external cost about $3 (DeepSeek API for synthetic data generation).
Stage 1 — QLoRA SFT (8h 13min)
QLoRA fine-tune of Qwen2.5-Coder-7B-Instruct on 21,847 text-to-SQL pairs:
- BIRD-SQL train split — 6,601 examples
- Spider train split — 8,034 examples
- Gretel synthetic text-to-SQL PostgreSQL subset — 5,212 examples
- PG-Modern custom synthesis — 2,000 examples covering pgvector, JSONB, full-text search, CTEs, window functions, and array operations
LoRA rank 32, alpha 64, 4-bit NF4 base, LR 1e-5, max_grad_norm 0.5, adamw_torch_fused. Final eval_loss 0.290.
Stage 2 — GRPO with execution rewards (16h)
GRPO with three reward signals: weighted execution match against BIRD SQLite databases (1.0), syntax validity via sqlglot (0.2), and code-fence formatting (0.1). 2000 steps total, num_generations=2, LR 5e-6.
The 16-hour wall time is from disabling vLLM rollouts (the available vLLM wheels are built for CUDA 13 and don't load on my CUDA 12.8 Blackwell driver). With vLLM, the same 2000 steps would have taken closer to 2–3 hours.
What GRPO actually fixed
Looking at the 500 BIRD-PG examples, GRPO fixed 12 queries that SFT got wrong and broke 6 that SFT had right — net +6, plus the broader trend of better dialect awareness.
The biggest improvement was dialect awareness. SFT kept generating MONTH(date) — that's MySQL syntax and just fails on Postgres. GRPO learned EXTRACT(MONTH FROM date) from the executions that came back as errors.
It also got better at date formats. SFT was guessing patterns like LIKE '%/%/87%' (assuming mm/dd/yy), which returned empty result sets against dates stored as YYYY-MM-DD. GRPO settled on LIKE '%1987%' after enough wrong-answer signals.
A smaller but interesting one: it learned when not to quote identifiers. SFT was over-quoting in cases where the DDL was unquoted, which broke case-sensitive matches.
Limitations
This is not a general SQL assistant for every dialect — it's tuned around PostgreSQL and SQLite specifically. Behavior on MySQL or SQL Server isn't validated.
The 7B size still shows up on harder examples. Challenging-tier BIRD-PG accuracy is 23.5%, and schema grounding is imperfect on tables with 30+ columns, where most remaining errors are hallucinated column names. My guess is that's a 7B context-handling limitation more than a training-data issue.
GRPO has its own failure mode I observed in the eval comparison: it occasionally over-quotes identifiers or adds unnecessary DISTINCT clauses. The 6 regressions across 500 BIRD-PG examples (against the SFT baseline) come from this pattern. The net gain was still positive, but it's one weakness of binary execution rewards — the model can't always distinguish between "succeeded because of better grounding" and "succeeded because of incidental stylistic choices in the rollout."
Notes for Blackwell laptops
On RTX 5080 / sm_120, vLLM CUDA 13 wheels didn't load on the CUDA 12.x runtime, so both stages trained through Unsloth's Triton fallback (no flash-attn, no nvcc). AdamW 8-bit produced NaNs within the first 100 SFT steps every time; adamw_torch_fused with LR 1e-5 and grad clipping at 0.5 stabilized SFT. For GRPO, the key stability fix was catching every exception type from sqlglot in the reward function — a TokenError from an unterminated string literal in one rollout crashed the run at step 320 the first time around.
Author
Phani
- GitHub: slonik-7b
- SFT-only baseline: Phani-labs/Slonik-7B-SFT
- GGUF quantizations: Phani-labs/Slonik-7B-GRPO-GGUF
- Downloads last month
- 15