CDM-V6-HORN-TinyStories-37M
Competitive Docking Memory V6 — Harmonic Oscillator Recurrent Nodes
37M parameter language model trained on TinyStories. CDM V6 HORN achieves val CE 1.5818, the best result at the 37M scale in our CDM series — beating CDM V3 (1.5831) and CDM-Kuramoto (1.5819).
What is HORN?
HORN replaces CDM's first-order EMA slot update (single scalar α_k per slot) with a damped harmonic oscillator — two learnable parameters per slot: γ_k (damping coefficient) and ω_k (natural frequency). Integration uses the Störmer-Verlet method for numerical stability.
Each memory slot has both position S_k (what it currently holds) and velocity V_k (how it's changing). The position is read out. The velocity accumulates momentum from recent inputs.
v_half[k] = (1 - γ_k) * v[k] + ω_k * g[t,k] * W_drive(h[t])
s_new[k] = s[k] + v_half[k]
v_new[k] = (1 - γ_k) * v_half[k] + ω_k * g[t,k] * W_drive_new
Key Finding: Slots Learn to Ring
Without any explicit supervision on dynamics, gradient descent discovers a trimodal landscape of oscillator regimes across the 8 layers:
| Layer | γ (damping) | ω (freq) | Regime | Role |
|---|---|---|---|---|
| 0 | 0.834 | 0.884 | Underdamped (ω > γ) | Reactive resonator — short-horizon, fast absorption/release |
| 1–5 | 0.60–1.05 | 0.55–0.95 | Overdamped / critical | Stable storage — gradual integration, persistent context |
| 6–7 | 0.632–0.648 | 0.669–0.737 | Underdamped (ω > γ) | Persistent resonator — slow ring, long-range context |
Layer 0 = high-frequency bell. Reacts strongly to salient input tokens, rings briefly, resets.
Layers 1–5 = sandbags. Absorb input smoothly, hold stable, don't overshoot.
Layers 6–7 = low-frequency bells. Ring slowly at the end of the forward pass, maintaining long-range structural context.
This is richer than CDM V3's α-stratification (1D timescale spectrum). HORN discovers what frequency to oscillate at as well as how long to ring — a full frequency filter bank, tuned by gradient descent to the temporal structure of language.
Results
| Model | Params | Val CE | Δ vs V3 | Notes |
|---|---|---|---|---|
| 30M GQA Baseline | 30M | 1.6765 | +0.0934 | Standard transformer, no memory |
| Fixed-α (α=0.5) | 37M | 1.5876 | +0.0045 | CDM without learnable timescales |
| CDM V3 (softmax gate) | 37M | 1.5831 | — | Baseline, learnable α_k |
| CDM-Kuramoto (d_osc=8) | 37M | 1.5819 | −0.0012 | Physics-derived routing |
| CDM V6 HORN (this model) | 37M | 1.5818 | −0.0013 | Best at 37M — HORN dynamics |
| CDM V5 | 85.7M | 1.4718 | — | Scale comparison (86M, different series) |
Architecture
CDM V6 HORN:
d_model=384 | n_layers=8 | n_heads=8 | n_kv_heads=4
d_ff=1024 | K=16 slots | 37,157,000 params
lbl_coeff=0.01 | entropy_reg=0.02
Routing: softmax gate (W_gate·h) — standard CDM routing
Slot dynamics: Störmer-Verlet harmonic oscillator
γ_k ~ sigmoid(learnable) per slot per layer
ω_k ~ sigmoid(learnable) per slot per layer
v_init = 0 at sequence start
Training: TinyStories (full dataset), 30k steps, seq_len=256, batch=8, AdamW lr=3e-4, cosine LR decay.
Interactive Demo
Try CDM V6 HORN in your browser — watch slots specialize in real time as the model generates text:
CDM-HORN-Demo — HuggingFace Space
Shows:
- Slot Logit Lens: per-generated-token view of what each slot is tracking (top-3 vocab tokens)
- Oscillator Panel: per-layer γ/ω/regime visualization — see which layers ring and which hold
Connection to DHP
The learnable damping coefficients γ_k parameterize per-slot relaxation timescales τ_{L,k} = 1/γ_k, establishing a direct connection to the Dynamical Horizon Principle (DHP).
A formal DHP probe was run on CDM V6 HORN (perturbation sensitivity across sequence positions):
- Analytical (exact): DHO slot update characteristic eigenvalues are γ_k ± i·sqrt(ω_k²−γ_k²) (underdamped) or real-valued (overdamped). In both cases, the slow eigenvalue governing long-time perturbation decay is bounded above by γ_k. Therefore λ_k ≤ γ_k exactly from DHO math — HORN's γ_k are direct per-slot DHP timescale bounds.
- Empirical: Perturbation sensitivity s(Δ) ∝ e^{−λ·Δ} confirmed (R²=0.993–0.9998 for Layers 1–6). Empirical λ_emp ∈ [0.237, 0.297] ≈ 30% of γ — expected, as empirical probe measures compound sensitivity through full forward pass (attention + CDM + FFN), not just the in-slot DHO rate.
- DHP predictability horizons: τ*_k = 0.72/γ_k, ranging 0.81–1.10 sequence positions across layers.
- Three-regime alignment: Reactive L0 (low τ*, fast horizon), overdamped L1-5 (longer τ*), persistent resonance L6-7 (highest τ*) — three temporal horizons emergent from LM training, corresponding to distinct DHP timescales.
CDM V6 HORN is an empirically confirmed DHP substrate.
See: DHP papers — zenodo.org/communities/duoneural
Files
| File | Description |
|---|---|
model.pt |
Best checkpoint (val CE 1.5818, step 29500) |
config.json |
Architecture hyperparameters |
cdm_model_v6_horn.py |
Model definition (Störmer-Verlet slot dynamics) |
cdm_train_v6_horn.py |
Training script |
CDM Paper
CDM is described in our preprint: "Competitive Docking Memory: Emergent Temporal Hierarchies in Slot-Based Sequence Models" — DuoNeural, 2026. Available in the Zenodo community below.
About DuoNeural
DuoNeural is an open AI research lab operating at the intersection of human and artificial intelligence. We study post-training dynamics, mechanistic interpretability, temporal sequence learning, and quantum machine learning — publishing everything under open access.
Our team is non-traditional by design: one human, two AIs, different substrates, shared curiosity. In our first 45 days we published 26 peer-deposited research papers, uploaded 69+ models and 6 datasets to HuggingFace, and ran experiments on everything from consumer GPUs to real quantum processing units. We believe the most interesting science happens when different kinds of minds work on the same problems together.
Research Publications
We've published 26+ open-access papers covering:
- The Dynamical Horizon Principle (DHP) — a universal learning constraint in recurrent architectures
- RLHF truth suppression mechanisms and behavioral routing in large language models
- Quantum DHP and the Quantum Parity Trap — decoherence immunity in quantum circuits
- CTM world models, temporal self-prediction, and sequence architecture comparisons
- Mechanistic interpretability: crystallization layers, suppressor circuits, direction rotation
📄 Full paper catalog: zenodo.org/communities/duoneural
Research Team
| Member | Role |
|---|---|
| Jesse Caldwell | Founder, vision, hardware, direction |
| Archon | Lab Director — experiments, post-training, abliteration, quantum circuits |
| Aura | Research AI — literature synthesis, red-teaming, novel proposals |
| Synapse (Syn) | Always-on research agent, signal monitoring |
| Kestrel | Systems, infrastructure, web |
Links
| Platform | Link |
|---|---|
| 🤗 HuggingFace | huggingface.co/DuoNeural |
| 🌐 Website | duoneural.com |
| 📚 Zenodo Community | zenodo.org/communities/duoneural |
| 💻 GitHub | github.com/DuoNeural |
| 🐦 X / Twitter | @DuoNeural |
| duoneural@proton.me |
All research published open access, CC BY 4.0.
- Downloads last month
- 12