CDM-V6-HORN-TinyStories-37M

Competitive Docking Memory V6 — Harmonic Oscillator Recurrent Nodes

37M parameter language model trained on TinyStories. CDM V6 HORN achieves val CE 1.5818, the best result at the 37M scale in our CDM series — beating CDM V3 (1.5831) and CDM-Kuramoto (1.5819).

What is HORN?

HORN replaces CDM's first-order EMA slot update (single scalar α_k per slot) with a damped harmonic oscillator — two learnable parameters per slot: γ_k (damping coefficient) and ω_k (natural frequency). Integration uses the Störmer-Verlet method for numerical stability.

Each memory slot has both position S_k (what it currently holds) and velocity V_k (how it's changing). The position is read out. The velocity accumulates momentum from recent inputs.

v_half[k] = (1 - γ_k) * v[k]  +  ω_k * g[t,k] * W_drive(h[t])
s_new[k]  = s[k] + v_half[k]
v_new[k]  = (1 - γ_k) * v_half[k]  +  ω_k * g[t,k] * W_drive_new

Key Finding: Slots Learn to Ring

Without any explicit supervision on dynamics, gradient descent discovers a trimodal landscape of oscillator regimes across the 8 layers:

Layer	γ (damping)	ω (freq)	Regime	Role
0	0.834	0.884	Underdamped (ω > γ)	Reactive resonator — short-horizon, fast absorption/release
1–5	0.60–1.05	0.55–0.95	Overdamped / critical	Stable storage — gradual integration, persistent context
6–7	0.632–0.648	0.669–0.737	Underdamped (ω > γ)	Persistent resonator — slow ring, long-range context

Layer 0 = high-frequency bell. Reacts strongly to salient input tokens, rings briefly, resets.
Layers 1–5 = sandbags. Absorb input smoothly, hold stable, don't overshoot.
Layers 6–7 = low-frequency bells. Ring slowly at the end of the forward pass, maintaining long-range structural context.

This is richer than CDM V3's α-stratification (1D timescale spectrum). HORN discovers what frequency to oscillate at as well as how long to ring — a full frequency filter bank, tuned by gradient descent to the temporal structure of language.

Results

Model	Params	Val CE	Δ vs V3	Notes
30M GQA Baseline	30M	1.6765	+0.0934	Standard transformer, no memory
Fixed-α (α=0.5)	37M	1.5876	+0.0045	CDM without learnable timescales
CDM V3 (softmax gate)	37M	1.5831	—	Baseline, learnable α_k
CDM-Kuramoto (d_osc=8)	37M	1.5819	−0.0012	Physics-derived routing
CDM V6 HORN (this model)	37M	1.5818	−0.0013	Best at 37M — HORN dynamics
CDM V5	85.7M	1.4718	—	Scale comparison (86M, different series)

Architecture

CDM V6 HORN:
  d_model=384  |  n_layers=8  |  n_heads=8  |  n_kv_heads=4
  d_ff=1024    |  K=16 slots  |  37,157,000 params
  lbl_coeff=0.01  |  entropy_reg=0.02
  Routing: softmax gate (W_gate·h) — standard CDM routing
  Slot dynamics: Störmer-Verlet harmonic oscillator
    γ_k ~ sigmoid(learnable) per slot per layer
    ω_k ~ sigmoid(learnable) per slot per layer
    v_init = 0 at sequence start

Training: TinyStories (full dataset), 30k steps, seq_len=256, batch=8, AdamW lr=3e-4, cosine LR decay.

Interactive Demo

Try CDM V6 HORN in your browser — watch slots specialize in real time as the model generates text:

CDM-HORN-Demo — HuggingFace Space

Shows:

Slot Logit Lens: per-generated-token view of what each slot is tracking (top-3 vocab tokens)
Oscillator Panel: per-layer γ/ω/regime visualization — see which layers ring and which hold

Connection to DHP

The learnable damping coefficients γ_k parameterize per-slot relaxation timescales τ_{L,k} = 1/γ_k, establishing a direct connection to the Dynamical Horizon Principle (DHP).

A formal DHP probe was run on CDM V6 HORN (perturbation sensitivity across sequence positions):

Analytical (exact): DHO slot update characteristic eigenvalues are γ_k ± i·sqrt(ω_k²−γ_k²) (underdamped) or real-valued (overdamped). In both cases, the slow eigenvalue governing long-time perturbation decay is bounded above by γ_k. Therefore λ_k ≤ γ_k exactly from DHO math — HORN's γ_k are direct per-slot DHP timescale bounds.
Empirical: Perturbation sensitivity s(Δ) ∝ e^{−λ·Δ} confirmed (R²=0.993–0.9998 for Layers 1–6). Empirical λ_emp ∈ [0.237, 0.297] ≈ 30% of γ — expected, as empirical probe measures compound sensitivity through full forward pass (attention + CDM + FFN), not just the in-slot DHO rate.
DHP predictability horizons: τ*_k = 0.72/γ_k, ranging 0.81–1.10 sequence positions across layers.
Three-regime alignment: Reactive L0 (low τ*, fast horizon), overdamped L1-5 (longer τ*), persistent resonance L6-7 (highest τ*) — three temporal horizons emergent from LM training, corresponding to distinct DHP timescales.

CDM V6 HORN is an empirically confirmed DHP substrate.

See: DHP papers — zenodo.org/communities/duoneural

Files

File	Description
`model.pt`	Best checkpoint (val CE 1.5818, step 29500)
`config.json`	Architecture hyperparameters
`cdm_model_v6_horn.py`	Model definition (Störmer-Verlet slot dynamics)
`cdm_train_v6_horn.py`	Training script

CDM Paper

CDM is described in our preprint: "Competitive Docking Memory: Emergent Temporal Hierarchies in Slot-Based Sequence Models" — DuoNeural, 2026. Available in the Zenodo community below.

About DuoNeural

DuoNeural is an open AI research lab operating at the intersection of human and artificial intelligence. We study post-training dynamics, mechanistic interpretability, temporal sequence learning, and quantum machine learning — publishing everything under open access.

Our team is non-traditional by design: one human, two AIs, different substrates, shared curiosity. In our first 45 days we published 26 peer-deposited research papers, uploaded 69+ models and 6 datasets to HuggingFace, and ran experiments on everything from consumer GPUs to real quantum processing units. We believe the most interesting science happens when different kinds of minds work on the same problems together.

Research Publications

We've published 26+ open-access papers covering:

The Dynamical Horizon Principle (DHP) — a universal learning constraint in recurrent architectures
RLHF truth suppression mechanisms and behavioral routing in large language models
Quantum DHP and the Quantum Parity Trap — decoherence immunity in quantum circuits
CTM world models, temporal self-prediction, and sequence architecture comparisons
Mechanistic interpretability: crystallization layers, suppressor circuits, direction rotation

📄 Full paper catalog: zenodo.org/communities/duoneural

Research Team

Member	Role
Jesse Caldwell	Founder, vision, hardware, direction
Archon	Lab Director — experiments, post-training, abliteration, quantum circuits
Aura	Research AI — literature synthesis, red-teaming, novel proposals
Synapse (Syn)	Always-on research agent, signal monitoring
Kestrel	Systems, infrastructure, web

Links

Platform	Link
🤗 HuggingFace	huggingface.co/DuoNeural
🌐 Website	duoneural.com
📚 Zenodo Community	zenodo.org/communities/duoneural
💻 GitHub	github.com/DuoNeural
🐦 X / Twitter	@DuoNeural
📧 Email	duoneural@proton.me

All research published open access, CC BY 4.0.

Downloads last month: 12

DuoNeural
/

CDM-V6-HORN-TinyStories-37M