prism-coder:9b — Prism Memory Tool Router (Default Tier)

QLoRA fine-tuned Qwen3.5-9B for MCP tool routing in the Prism Coder system. Replaces the previous 14B tier with 36% smaller footprint and higher accuracy.

Quick Start

ollama pull dcostenco/prism-coder:9b

BFCL Benchmark — 100% × 3 seeds

64/64 × 3 shuffled runs = 100.0%, 0 hallucinations

Category	Count	Accuracy
simple	10	100%
relevance_detection	10	100%
hallucination	10	100%
disambiguation	8	100%
format_sensitivity	5	100%
ast_parameter	5	100%
edge_case	8	100%
multi_turn_chain	8	100%

vs Previous 14B (Qwen 3)

Metric	14B (Qwen 3)	9B (Qwen 3.5)
BFCL Overall	90.3%	100.0%
Model Size	9.0 GB	5.8 GB
Hallucinations	0	0

Training

Method: QLoRA (4-bit base + bf16 adapters) on Apple Silicon M5 48GB
Base: mlx-community/Qwen3.5-9B-MLX-4bit for training, Qwen/Qwen3.5-9B bf16 for merge
LoRA: rank=32, alpha=64, 16 layers
Corpus: 26K rows (40% AAC / 12% abstention / 12% safety / 36% tool-use)
Iterations: 2000 at LR 1e-4
Layer 3: Inference-time rules for tool remapping, param normalization, multi-turn chain parsing

Architecture

Qwen3.5-9B uses a hybrid attention architecture:

Linear attention layers (Gated DeltaNet) — O(n) inference, pattern matching
Full attention layers (standard softmax) — precise retrieval and reasoning

Fleet Position

Model	Ollama tag	Size	BFCL	Role
Qwen3.5-2B Q3_K_M	`dcostenco/prism-coder:2b`	2.3 GB	99.1%	iPhone / mobile
Qwen3.5-4B Q4_K_M	`dcostenco/prism-coder:4b`	3.4 GB	100%	Verifier / 8 GB+
Qwen3.5-9B Q4_K_M	`dcostenco/prism-coder:9b`	5.8 GB	100%	Default router
prism-coder:32b	`dcostenco/prism-coder:32b`	19 GB	100%	Complex tasks

Model tree for dcostenco/prism-coder-9b

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Finetuned

(377)

this model

dcostenco
/

prism-coder-9b