prism-coder:4b — Prism Memory Tool Router

Prompt-engineered Qwen3.5-4B for MCP tool routing in the Prism Coder system. No fine-tuning — the system prompt IS the specialization.

Downloads

File	Quantization	Size	BFCL Accuracy	Use when
`Qwen3.5-4B-Q3_K_M.gguf`	Q3_K_M	2.3 GB	99.1% × 3 seeds	iPhone / mobile first gate
(stock via Ollama)	Q4_K_M	3.4 GB	100% × 3 seeds	Mac / 8 GB+ devices

Quick Start

# iPhone-optimized (2.3 GB, 99.1%)
ollama pull dcostenco/prism-coder:2b

# Full quality (3.4 GB, 100%)
ollama pull dcostenco/prism-coder:4b

BFCL Benchmark

Q3_K_M (prism-coder:2b) — 99.1% × 3 seeds

114/115 × 3 shuffled runs = 99.1%, 1 flaky case

Category	Count	Accuracy
save	17	100%
smem	17	100%
aac	12	100%
hand	12	100%
irrel	10	90%
load	9	100%
pred	8	100%
know	7	100%
cmpct	6	100%
edge	6	100%
tran	6	100%
info	5	100%

Single failure: "Write a regex to match email addresses" → knowledge_search instead of plain.

Q4_K_M (prism-coder:4b) — 100% × 3 seeds

115/115 × 3 shuffled runs = 100.0%, 0 flaky

Architecture

Qwen3.5-4B uses a hybrid attention architecture:

24 linear attention layers (Gated DeltaNet) — O(n) inference
8 full attention layers (standard softmax) — precise retrieval

This hybrid design is why prompt-only routing works at 4B scale but not smaller. The 8 full-attention layers are sufficient to hold the routing rules when combined with the DeltaNet layers' pattern matching.

Fleet Position

Model	Ollama tag	Size	BFCL	Role
Qwen3.5-4B Q3_K_M	`dcostenco/prism-coder:2b`	2.3 GB	99.1%	iPhone / mobile
Qwen3.5-4B Q4_K_M	`dcostenco/prism-coder:4b`	3.4 GB	100%	Verifier / 8 GB+
Qwen3.5-9B Q4_K_M	`dcostenco/prism-coder:9b`	5.8 GB	100%	Default router
prism-coder:32b	`dcostenco/prism-coder:32b`	19 GB	100%	Complex tasks

Model tree for dcostenco/prism-coder-2b

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B