prism-coder:4b β Prism Memory Tool Router
Prompt-engineered Qwen3.5-4B for MCP tool routing in the Prism Coder system. No fine-tuning β the system prompt IS the specialization.
Downloads
| File | Quantization | Size | BFCL Accuracy | Use when |
|---|---|---|---|---|
Qwen3.5-4B-Q3_K_M.gguf |
Q3_K_M | 2.3 GB | 99.1% Γ 3 seeds | iPhone / mobile first gate |
| (stock via Ollama) | Q4_K_M | 3.4 GB | 100% Γ 3 seeds | Mac / 8 GB+ devices |
Quick Start
# iPhone-optimized (2.3 GB, 99.1%)
ollama pull dcostenco/prism-coder:2b
# Full quality (3.4 GB, 100%)
ollama pull dcostenco/prism-coder:4b
BFCL Benchmark
Q3_K_M (prism-coder:2b) β 99.1% Γ 3 seeds
114/115 Γ 3 shuffled runs = 99.1%, 1 flaky case
| Category | Count | Accuracy |
|---|---|---|
| save | 17 | 100% |
| smem | 17 | 100% |
| aac | 12 | 100% |
| hand | 12 | 100% |
| irrel | 10 | 90% |
| load | 9 | 100% |
| pred | 8 | 100% |
| know | 7 | 100% |
| cmpct | 6 | 100% |
| edge | 6 | 100% |
| tran | 6 | 100% |
| info | 5 | 100% |
Single failure: "Write a regex to match email addresses" β knowledge_search instead of plain.
Q4_K_M (prism-coder:4b) β 100% Γ 3 seeds
115/115 Γ 3 shuffled runs = 100.0%, 0 flaky
Architecture
Qwen3.5-4B uses a hybrid attention architecture:
- 24 linear attention layers (Gated DeltaNet) β O(n) inference
- 8 full attention layers (standard softmax) β precise retrieval
This hybrid design is why prompt-only routing works at 4B scale but not smaller. The 8 full-attention layers are sufficient to hold the routing rules when combined with the DeltaNet layers' pattern matching.
Fleet Position
| Model | Ollama tag | Size | BFCL | Role |
|---|---|---|---|---|
| Qwen3.5-4B Q3_K_M | dcostenco/prism-coder:2b |
2.3 GB | 99.1% | iPhone / mobile |
| Qwen3.5-4B Q4_K_M | dcostenco/prism-coder:4b |
3.4 GB | 100% | Verifier / 8 GB+ |
| Qwen3.5-9B Q4_K_M | dcostenco/prism-coder:9b |
5.8 GB | 100% | Default router |
| prism-coder:32b | dcostenco/prism-coder:32b |
19 GB | 100% | Complex tasks |
Links
- Ollama model page β pull and run
- Prism MCP Server β the MCP server
- Qwen3.5-4B base β upstream model