Prizma
β οΈ Research artifact / method β not a plug-and-play model. There are no task weights to load; this repository presents two small-scale research threads, their code, and their auditable results.
Two small-scale research threads built the same way: a pre-registered falsifiable bar, a
parameter/FLOP-matched baseline, an adversarial referee audit, and honest, binding
limits. No faked metrics β every number is produced by a reproducible script and the raw result
JSONs are committed under results/.
| Thread | Question | Verdict (in the tested regime) |
|---|---|---|
| Prizma-Seq | Can a parameter-free quadratic delta-state sequence mixer stand in for attention at small scale? | Candidate β clears the Β§4 diagnostic bar param-matched vs a tuned Transformer; constant-memory + long-context O(1)-latency edge; honest losses disclosed. |
| Prizma | Can a backprop-free, fully-local learner do task-boundary-free continual learning? | Zero forgetting in the input-distinguishable regime, beating backprop & EWC β no replay, no boundaries, no weight transport. |
Prizma-Seq β efficient-attention-replacement candidate
Prizma-Seq is a Gated-DeltaNet-family sequence mixer whose novel lever is a parameter-free
quadratic feature map (quad2) that makes the per-head carried associative state rectangular
(d_h Γ d_Ο, with the monomials as fixed seeded buffers β 0 added parameters). At small scale,
parameter-matched against a tuned decoder-only Transformer (RMSNorm + SwiGLU + RoPE), it clears
the project's pre-registered Β§4 bar:
| Leg | Verdict | Headline |
|---|---|---|
| MQAR (D=128) | PASS | parity @860K params; solves @130K where the matched TF needs β₯461K β β₯3.5Γ param-efficiency (coarse grid) |
| Induction | PASS | quad2 0.9995 (3/3) vs TF 0.996 |
| Selective-copy | PASS | selective 0.9991; a fixed-position control isolates content-selectivity |
| Char-LM (text8) | PASS | Prizma 1.7496 vs TF 1.7254 BPC β within the pre-registered +0.05 bar (does not beat TF) |
| Inference | PASS (memory) | constant 17.9 MB state βn (28β455Γ less); measured O(1)-latency crossover at nβ₯32k (2.4β2.8Γ faster @65k) |
| Causal ablation | PASS | quad2 β« rand_linear β none β« TF β the gain is the quadratic monomials, not "a bigger RNN" |
| Length-extrapolation | WIN (relative) | 10Γ better retention than a RoPE Transformer at 8Γ train length (absolute accuracy still only ~0.40) |
Honest scope β a candidate, not a proven alternative. Char-LM is a loss-within-margin; the
latency win is long-context-only (Prizma is ~1.3β1.6Γ slower below nβ16k) and Prizma trains ~5Γ
slower per step (sequential delta); the FLOP-matched TF arms were optimization-confounded so no
per-FLOP claim is made; n=2β3 seeds are descriptive (not powered equivalence); large-scale LM
parity and backprop-free parity are NOT claimed (open frontiers). The quad2 kernel belongs to the
Based/Hedgehog feature-map family β the novelty is the rectangular-delta-state framing, not the kernel.
- Full writeup + adversarial referee trail β
docs/PRIZMA_SEQ_REPORT.md - Raw A100 results (auditable) β
results/gpu_{bench,diag,lengen,latency,charlm2}.json+results/v3_campaign_results.md - Code β
seq/(mixer, tasks, transformer baseline),gpu_*.py(GPU runners),PRIZMA_run_*.ipynb(Colab bootstrap)
# local kernel self-tests / smoke (CPU/MPS), then the GPU runners on an A100:
PRIZMA_RESULTS=results python gpu_diag.py induction selcopy # B2/B3
PRIZMA_RESULTS=results python gpu_charlm2.py --skip_none # B4 (text8)
PRIZMA_RESULTS=results python gpu_latency.py # B5 latency/memory
PRIZMA_RESULTS=results python gpu_lengen.py # length-extrapolation
Prizma β backprop-free, fully-local continual learning
A backprop-free, fully-local, predictive-coding learning architecture targeting neuromorphic/analog hardware.
Prizma demonstrates task-boundary-free, task-label-free continual learning: in an input-distinguishable (domain-incremental) stream it reaches zero forgetting while beating naive backprop and (boundary-using) EWC β using only local learning rules (no backprop, no weight transport; works with random-feedback DFA). Its limits are characterized honestly: it provides no benefit in the fully-ambiguous regime (proven impossible for any single-head learner) and degrades gracefully as domains overlap.
Headline result (E1, structured-permuted, 10 seeds, Β±95% CI)
| Learner | ACC | FGT (forgettingβ) | boundaries? | buffer? | W^T? |
|---|---|---|---|---|---|
| backprop MLP | 0.445 | 0.553 | β | β | β |
| EWC | 0.456 | 0.411 | yes | β | β |
| replay (buffer 1000) | 0.737 | 0.156 | yes | yes | β |
| oracle_multihead (upper bound) | 0.879 | 0.000 | task-id given | β | β |
| Prizma (DFA, no W^T) | 0.834 | 0.000 | none | none | none |
| Prizma (exact W^T) | 0.708 | 0.000 | none | none | yes |
| PRIZMA_noRoute (ablation) | 0.446 | 0.489 | β | β | β |
Prizma sits between replay and the task-id-oracle, matching the oracle's zero forgetting without being told the task id, no replay, no boundaries, no weight transport (the W^T-free DFA variant is the best). The ablation shows recognition-routing is the causal mechanism. Adversarially audited by a 4-referee panel (no leakage, fair, reproduces, honest).
- Full writeup (equations, borrowed-vs-new ledger, neuromorphic mapping, limits) β
docs/Prizma_EN.md - Code β
src/(prizma + baselines + data + metrics),experiments/(E1βE5 suite + figure)
python3.13 -m venv .venv && ./.venv/bin/pip install numpy matplotlib
./.venv/bin/python experiments/run_continual.py # ~2.5 min β results/results.json
./.venv/bin/python experiments/make_figure.py # β results/figure.png
Status: research prototypes. Neither thread claims large-scale parity; each is a falsifiability gate passed (or honestly refused) in a precisely-characterized small-scale regime.
Citation
@misc{prizma2026,
author = {Aylin},
title = {Prizma: A Parameter-Free Quadratic Delta-State Sequence Mixer and a Backprop-Free Local Continual Learner},
year = {2026},
note = {Research artifact}
% eprint = {XXXX.XXXXX}, archivePrefix={arXiv}, primaryClass={cs.LG}
}
License
Released under the Apache-2.0 license.
Author
Aylin β Independent Researcher