prism-coder:4b — Full Prism Memory Router (Mid-Tier)

Fine-tuned Qwen3-4B for 17-tool Prism Memory routing in the Prism AAC system. Primary deployment: Mac / PC / high-memory mobile via Ollama or llama.cpp GGUF — for devices with ≥8 GB free RAM.

BFCL Routing Benchmark — v43 (Current)

100.0% (64/64 strict, 8 categories)

Category Count Description Accuracy
simple 10 Direct single-tool invocations 100%
relevance_detection 10 No-tool abstention for off-topic prompts 100%
hallucination 10 Reject fabricated / nonexistent tools 100%
disambiguation 8 Pick correct tool from near-neighbors 100%
format_sensitivity 5 Varied natural phrasing for same intent 100%
ast_parameter 5 Correct argument extraction 100%
edge_case 8 Boundary and adversarial inputs 100%
multi_turn_chain 8 Two-step tool sequences 100%

Eval: Ollama inference, temperature=0, greedy decode. Gate: ≥90% = deploy.

SWE Bench Blind Eval — v43

100.0% (68/68 strict, 7 categories) — held-out test set, no overlap with training data.

Category Count Accuracy
adversarial_trap 15 100%
cascade 10 100%
disambiguation 8 100%
edge_case 8 100%
multi_intent 4 100%
natural_phrasing 15 100%
verifier 8 100%

eval-300 — v43

100.0% (300/300 strict, 5 shuffled runs, 0 flaky tests)

Category Count Accuracy
abstention 20 100%
adversarial_trap 70 100%
cascade 25 100%
disambiguation 40 100%
edge_case 25 100%
multi_intent 20 100%
natural_phrasing 50 100%
param_extraction 25 100%
verifier 25 100%

Version History

Version BFCL SWE Bench eval-300 Notes
v43 100% 100% 100% Qwen3-4B base, 17-tool full router, Layer 3 inference-time remapping, 5 surgical patches

Tools

The model routes to 17 Prism Memory tools:

Tool Trigger
session_load_context Load / resume / catch me up on project context
session_save_ledger Jot down / log / note / record what we did
session_save_experience Log milestone / achievement / success event
session_save_handoff Save state for next agent / shift change
session_search_memory Recall / remind me / find what we decided
session_forget_memory Delete a specific memory entry by ID
session_export_memory Export session to file (JSON / Markdown)
session_compact_ledger Compact / prune old session entries
session_health_check Check session integrity
session_synthesize_edges Verify / rebuild session link graph
session_backfill_links Reconnect / patch missing session links
session_task_route Route a task to the right agent tier
knowledge_search Search knowledge base / accumulated docs
knowledge_forget Delete knowledge entries / wipe records
knowledge_upvote Upvote / boost / increase rank of entry
knowledge_downvote Downvote / lower rank of entry
knowledge_set_retention Set TTL / auto-expire / retention policy

Plain text (no tool) for: greetings, general questions, math, code help, weather, CS concepts.

Model Details

  • Base: Qwen/Qwen3-4B
  • Format: GGUF Q4_K_M (~2.3 GB)
  • Context: 32,768 tokens
  • Training: MLX LoRA on Apple Silicon, rank=32, alpha=64, 16/36 layers, LR=1e-4 (full) → 3e-5 (surgical patches), 5 patch rounds
  • Corpus: ~30K rows — 36% tool-use, 40% AAC/clinical, 12% abstention, 12% safety
  • Merge: direct safetensors delta merge (delta = (alpha/rank) × B.T @ A.T) — mlx_lm.fuse not used (silently drops LoRA weights)
  • Quantization: llama.cpp F16 → Q4_K_M

Usage

ollama pull dcostenco/prism-coder:4b-v43
ollama run dcostenco/prism-coder:4b-v43

Or drop the GGUF into any llama.cpp-compatible runtime (LM Studio, Jan, llama-server).

In Prism AAC the app loads this model automatically on devices with ≥8 GB free RAM.

Training Scripts

The training/ folder in this repo contains the full v43 training pipeline:

Script Purpose
build_4b_v43_corpus.py Full v43 corpus builder (~30K rows)
build_4b_v43_patch.py Patch 1 — initial BFCL failures
build_4b_v43_patch2.py Patch 2 — param extraction + format
build_4b_v43_patch4.py Patch 4 — task_route + casual phrasing
build_4b_v43_swe_patch.py Patch 5 — SWE bench targeted
combine_4b_swe_corpus.py Merge base + SWE patch corpus
train_4b_v43_local.sh MLX LoRA training (Apple Silicon)
train_4b_v43_swe_patch.sh Surgical SWE patch training run
merge_4b_v43.py Safe LoRA merge (delta = scale × B.T @ A.T)
export_4b_v43_gguf.sh HF safetensors → GGUF F16 → Q4_K_M → Ollama
orchestrate_4b_to_100.sh Autonomous patch→train→eval loop
bfcl_eval.py 64-test BFCL eval harness with Layer 3
swe_bench_test.py 68-test SWE blind eval harness
eval_300.py 300-test standard eval (9 categories)
analyze_swe_failures.py Parse failures → patch targets
TRAINING_DECISIONS_4B_V43.md Hyperparams, corpus ratios, lessons learned

Model Family

Model GGUF RAM Tools Repo
prism-coder:1b7 1.2 GB ≥3 GB 6 dcostenco/prism-coder-1.7b
prism-coder:4b 2.3 GB ≥8 GB 17 this repo
prism-coder:8b 4.9 GB ≥16 GB 6 dcostenco/prism-coder-8b
prism-coder:14b 8.4 GB ≥24 GB 6 + TypeScript dcostenco/prism-coder-14b
prism-coder:32b 16 GB ≥48 GB 6 dcostenco/prism-coder-32b
Downloads last month
-
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dcostenco/prism-coder-4b

Finetuned
Qwen/Qwen3-4B
Quantized
(220)
this model