⚠️ Conference talk demo — not production weights.

This model accompanies a conference keynote on local on-device AI. Published as a reference for the fine-tuning patterns shown on stage — not a deployable artefact. No security audit, no SLA, pinned to the talk's state.


Qwen3.5-4B FT (f16) — Tool Calling

Base model Qwen/Qwen3.5-4B (4.0B params)
License Tongyi Qianwen License — see MODEL_LICENSES.md
Training script finetune/train_qwen35_toolcalling.py
Method QLoRA r=16, α=16, 2 epochs, lr=2e-4 (via Unsloth — CUDA only)
Training data data/training-data/qwen35_toolcalling_{scenario}.jsonl (~1,300 hand-curated tool-call examples)
Hardware CUDA required (Unsloth dependency). Tested on RTX PRO 6000.
Intended use Tool selection (sql_query / calculator) + argument generation. Native OpenAI tool-calling format. enable_thinking=False to keep output clean for llama.cpp's autoparser.
Out of scope Free-form chat, RAG synthesis, intent classification. The model is trained only on tool-call outputs.
Reference eval (Nextera, v9 post-2026-05-15 retrain) Tool routing: 99.4%. Multi-step decomposition (gemma3-ft): 98.8%. Multi-step chain shape: 97.5% (78/80, deterministic — verified byte-identical across 3 runs). SQL exec validity: 100% (79/79). Calculator expression correctness: 95.0%.
Known failure modes Occasionally generates <think> blocks despite enable_thinking=false — the _strip_thinking filter in src/engine/inference/client.py handles this at parse time. Will refuse to answer if the query is clearly outside both tools (correct behaviour, but eval treats as "wrong tool").
Downloads last month
81
GGUF
Model size
4B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for thinktecture/qwen3.5-4b-toolcalling-ft-nextera-q4_k_m

Finetuned
Qwen/Qwen3.5-4B
Quantized
(262)
this model