Qwen3-4B-Thinking-2507 Text-to-SQL Agent FT

This is a fine-tuned Qwen3-4B-Thinking-2507 model for agentic Text-to-SQL in Brazilian Portuguese. It was trained on filtered teacher trajectories so that a small local model can learn database-agent behavior: inspecting table metadata, executing exploratory SQL, recovering from SQL errors, and deciding whether to answer with SQL, ask for clarification, or abstain as unanswerable.

Code and reproducibility repository:

https://github.com/Boakpe/distilled-slms-for-text-to-sql-pt-br

Related collection:

https://huggingface.co/collections/Boakpe/distilled-slms-for-text-to-sql-pt-br

What This Model Is For

The model is intended to be used with the SQL-agent runtime in the GitHub repository. The runtime provides the tool interface used during training:

  • get_table_schema
  • execute_sql
  • final_answer

This is not a general chat model. It is specialized for tool-using Text-to-SQL workflows in Portuguese.

Training Data

The model was fine-tuned on:

https://huggingface.co/datasets/Boakpe/pt-br-agentic-text-to-sql-distilled-trajectories

The public dataset contains 7,442 message-only trajectories selected from LLM-judged correct agent conversations. Sensitive CPF/CNPJ-like values were pseudonymized before release.

Results

Primary benchmark: anonymized PostgreSQL/PostGIS environmental-registry database, 180 questions:

  • 90 SQL
  • 45 clarification
  • 45 unanswerable
Model Overall Strict SQL Relaxed SQL Non-SQL Clarification Unanswerable Runtime
Qwen3-4B-Thinking-2507 base 56.1 28.9 36.7 75.6 71.1 80.0 2h 42m
Qwen3-4B-Thinking FT 78.9 34.4 70.0 87.8 86.7 88.9 2h 01m
Qwen3.5-27B-Q3_K_M teacher 75.0 40.0 70.0 80.0 75.6 84.4 3h 52m

SQL accuracy by difficulty, relaxed execution:

Model Easy Medium Hard Expert
Qwen3-4B-Thinking-2507 base 70.0 43.3 20.0 0.0
Qwen3-4B-Thinking FT 80.0 86.7 63.3 20.0

Pass@K for the fine-tuned model:

Setting Overall Relaxed SQL Non-SQL Runtime
Pass@1 78.9 70.0 87.8 2h 01m
Pass@5 91.7 87.8 95.6 5h 31m

Out-of-domain benchmark: rede_saude_publica, 100 questions:

Model Overall SQL Non-SQL
Qwen3-4B-Thinking-2507 base 70.0 64.0 76.0
Qwen3-4B-Thinking FT 75.0 72.0 78.0
Qwen3.5-27B-Q3_K_M teacher 82.0 90.0 74.0

Recommended Inference

For local inference, the GGUF Q8_0 export is usually easier:

https://huggingface.co/Boakpe/Qwen3-4B-Thinking-2507-Text-to-SQL-Agent-FT-GGUF

Use the GitHub repository for the runnable agent and benchmark setup:

https://github.com/Boakpe/distilled-slms-for-text-to-sql-pt-br

Limitations

  • The model is specialized for the released SQL-agent protocol.
  • It can generate plausible but semantically wrong SQL, especially on hard and expert questions.
  • It should not be used as a production decision system without independent SQL validation.
  • Results may depend on inference server support for tool calling and the chat template.

License

Apache 2.0.

Downloads last month
14
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Boakpe/Qwen3-4B-Thinking-2507-Text-to-SQL-Agent-FT

Finetuned
(244)
this model

Dataset used to train Boakpe/Qwen3-4B-Thinking-2507-Text-to-SQL-Agent-FT

Collection including Boakpe/Qwen3-4B-Thinking-2507-Text-to-SQL-Agent-FT