QueryCraft β€” Phi-3 Mini Fine-Tuned for Text-to-SQL

Fine-tuned Phi-3 Mini 3.8B using QLoRA on 76,000 Text-to-SQL examples. Converts natural language questions into valid SQL queries.

Evaluation Results

Metric Base Model Fine-Tuned
Exact Match 0.0% 82.0%
Execution Accuracy 84.0% 96.0%
BLEU Score 55.79 96.42

Evaluated on 50 held-out validation examples not seen during training.

Model Details

Property Value
Base model microsoft/Phi-3-mini-4k-instruct (3.8B params)
Fine-tuning method QLoRA (4-bit NF4 + LoRA)
LoRA rank r=16, alpha=32
Trainable parameters 8,912,896 (0.23%)
Training examples 76,577
Training hardware NVIDIA RTX 5060 Ti 8GB
Training time 3 hours 2 minutes
Final train loss 0.5677
Max sequence length 256 tokens

How to Use

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

base_model = "microsoft/Phi-3-mini-4k-instruct"
adapter    = "Sid9797/querycraft-phi3-sql"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    device_map="cuda:0",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
model = PeftModel.from_pretrained(model, adapter)
model.eval()

prompt = '''### System:
You are a SQL expert. Given a database schema and a natural language question, generate a valid SQL query that answers the question. Output only the SQL query with no explanation.

### Schema:
CREATE TABLE employees (id INTEGER, name VARCHAR, department VARCHAR, salary FLOAT)

### Question:
What is the average salary by department?

### SQL:
'''

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=128, do_sample=False)

sql = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(sql.strip().split("\n")[0])
# SELECT AVG(salary) FROM employees GROUP BY department

Prompt Format

The model was trained on the Alpaca instruction format: System: You are a SQL expert. Given a database schema and a natural language question, generate a valid SQL query that answers the question. Output only the SQL query with no explanation. Schema: {CREATE TABLE statements} Question: {natural language question} SQL: {model generates SQL here}

Training Details

  • Dataset: b-mc2/sql-create-context β€” 78,577 examples with inline CREATE TABLE schemas
  • Train/Val split: 76,577 train / 2,000 validation (seeded shuffle)
  • Quantization: 4-bit NF4 with double quantization (bitsandbytes)
  • LoRA target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Optimizer: AdamW with cosine LR schedule, warmup_ratio=0.05
  • Effective batch size: 16 (batch_size=4, gradient_accumulation=4)
  • Packing: Enabled β€” short examples concatenated to fill 256-token sequences

Why the Base Model Scored 0% Exact Match

The base Phi-3 Mini, without fine-tuning, consistently wrapped SQL output in markdown code fences (```sql ... ```) and appended semicolons. This formatting breaks exact match evaluation even when the SQL logic is correct. Fine-tuning on consistently formatted examples eliminated this entirely.

Limitations

  • Optimised for single-table and simple multi-table queries
  • Schema must be provided as CREATE TABLE SQL statements
  • Best results on English-language questions
  • May struggle with highly complex nested subqueries

Links

Downloads last month
30
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Sid9797/querycraft-phi3-sql

Adapter
(840)
this model

Dataset used to train Sid9797/querycraft-phi3-sql