prism-coder:9b — Prism Memory Tool Router (Default Tier)

QLoRA fine-tuned Qwen3.5-9B for MCP tool routing in the Prism Coder system. Replaces the previous 14B tier with 36% smaller footprint and higher accuracy.

Quick Start

ollama pull dcostenco/prism-coder:9b

BFCL Benchmark — 100% × 3 seeds

64/64 × 3 shuffled runs = 100.0%, 0 hallucinations

Category Count Accuracy
simple 10 100%
relevance_detection 10 100%
hallucination 10 100%
disambiguation 8 100%
format_sensitivity 5 100%
ast_parameter 5 100%
edge_case 8 100%
multi_turn_chain 8 100%

vs Previous 14B (Qwen 3)

Metric 14B (Qwen 3) 9B (Qwen 3.5)
BFCL Overall 90.3% 100.0%
Model Size 9.0 GB 5.8 GB
Hallucinations 0 0

Training

  • Method: QLoRA (4-bit base + bf16 adapters) on Apple Silicon M5 48GB
  • Base: mlx-community/Qwen3.5-9B-MLX-4bit for training, Qwen/Qwen3.5-9B bf16 for merge
  • LoRA: rank=32, alpha=64, 16 layers
  • Corpus: 26K rows (40% AAC / 12% abstention / 12% safety / 36% tool-use)
  • Iterations: 2000 at LR 1e-4
  • Layer 3: Inference-time rules for tool remapping, param normalization, multi-turn chain parsing

Architecture

Qwen3.5-9B uses a hybrid attention architecture:

  • Linear attention layers (Gated DeltaNet) — O(n) inference, pattern matching
  • Full attention layers (standard softmax) — precise retrieval and reasoning

Fleet Position

Model Ollama tag Size BFCL Role
Qwen3.5-2B Q3_K_M dcostenco/prism-coder:2b 2.3 GB 99.1% iPhone / mobile
Qwen3.5-4B Q4_K_M dcostenco/prism-coder:4b 3.4 GB 100% Verifier / 8 GB+
Qwen3.5-9B Q4_K_M dcostenco/prism-coder:9b 5.8 GB 100% Default router
prism-coder:32b dcostenco/prism-coder:32b 19 GB 100% Complex tasks

Links

Downloads last month
-
Safetensors
Model size
10B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dcostenco/prism-coder-9b

Finetuned
Qwen/Qwen3.5-9B
Finetuned
(377)
this model