CDM-V6-HORN-TinyStories-37M

Competitive Docking Memory V6 — Harmonic Oscillator Recurrent Nodes

37M parameter language model trained on TinyStories. CDM V6 HORN achieves val CE 1.5818, the best result at the 37M scale in our CDM series — beating CDM V3 (1.5831) and CDM-Kuramoto (1.5819).


What is HORN?

HORN replaces CDM's first-order EMA slot update (single scalar α_k per slot) with a damped harmonic oscillator — two learnable parameters per slot: γ_k (damping coefficient) and ω_k (natural frequency). Integration uses the Störmer-Verlet method for numerical stability.

Each memory slot has both position S_k (what it currently holds) and velocity V_k (how it's changing). The position is read out. The velocity accumulates momentum from recent inputs.

v_half[k] = (1 - γ_k) * v[k]  +  ω_k * g[t,k] * W_drive(h[t])
s_new[k]  = s[k] + v_half[k]
v_new[k]  = (1 - γ_k) * v_half[k]  +  ω_k * g[t,k] * W_drive_new

Key Finding: Slots Learn to Ring

Without any explicit supervision on dynamics, gradient descent discovers a trimodal landscape of oscillator regimes across the 8 layers:

Layer γ (damping) ω (freq) Regime Role
0 0.834 0.884 Underdamped (ω > γ) Reactive resonator — short-horizon, fast absorption/release
1–5 0.60–1.05 0.55–0.95 Overdamped / critical Stable storage — gradual integration, persistent context
6–7 0.632–0.648 0.669–0.737 Underdamped (ω > γ) Persistent resonator — slow ring, long-range context

Layer 0 = high-frequency bell. Reacts strongly to salient input tokens, rings briefly, resets.
Layers 1–5 = sandbags. Absorb input smoothly, hold stable, don't overshoot.
Layers 6–7 = low-frequency bells. Ring slowly at the end of the forward pass, maintaining long-range structural context.

This is richer than CDM V3's α-stratification (1D timescale spectrum). HORN discovers what frequency to oscillate at as well as how long to ring — a full frequency filter bank, tuned by gradient descent to the temporal structure of language.


Results

Model Params Val CE Δ vs V3 Notes
30M GQA Baseline 30M 1.6765 +0.0934 Standard transformer, no memory
Fixed-α (α=0.5) 37M 1.5876 +0.0045 CDM without learnable timescales
CDM V3 (softmax gate) 37M 1.5831 Baseline, learnable α_k
CDM-Kuramoto (d_osc=8) 37M 1.5819 −0.0012 Physics-derived routing
CDM V6 HORN (this model) 37M 1.5818 −0.0013 Best at 37M — HORN dynamics
CDM V5 85.7M 1.4718 Scale comparison (86M, different series)

Architecture

CDM V6 HORN:
  d_model=384  |  n_layers=8  |  n_heads=8  |  n_kv_heads=4
  d_ff=1024    |  K=16 slots  |  37,157,000 params
  lbl_coeff=0.01  |  entropy_reg=0.02
  Routing: softmax gate (W_gate·h) — standard CDM routing
  Slot dynamics: Störmer-Verlet harmonic oscillator
    γ_k ~ sigmoid(learnable) per slot per layer
    ω_k ~ sigmoid(learnable) per slot per layer
    v_init = 0 at sequence start

Training: TinyStories (full dataset), 30k steps, seq_len=256, batch=8, AdamW lr=3e-4, cosine LR decay.


Interactive Demo

Try CDM V6 HORN in your browser — watch slots specialize in real time as the model generates text:

CDM-HORN-Demo — HuggingFace Space

Shows:

  • Slot Logit Lens: per-generated-token view of what each slot is tracking (top-3 vocab tokens)
  • Oscillator Panel: per-layer γ/ω/regime visualization — see which layers ring and which hold

Connection to DHP

The learnable damping coefficients γ_k parameterize per-slot relaxation timescales τ_{L,k} = 1/γ_k, establishing a direct connection to the Dynamical Horizon Principle (DHP).

A formal DHP probe was run on CDM V6 HORN (perturbation sensitivity across sequence positions):

  • Analytical (exact): DHO slot update characteristic eigenvalues are γ_k ± i·sqrt(ω_k²−γ_k²) (underdamped) or real-valued (overdamped). In both cases, the slow eigenvalue governing long-time perturbation decay is bounded above by γ_k. Therefore λ_k ≤ γ_k exactly from DHO math — HORN's γ_k are direct per-slot DHP timescale bounds.
  • Empirical: Perturbation sensitivity s(Δ) ∝ e^{−λ·Δ} confirmed (R²=0.993–0.9998 for Layers 1–6). Empirical λ_emp ∈ [0.237, 0.297] ≈ 30% of γ — expected, as empirical probe measures compound sensitivity through full forward pass (attention + CDM + FFN), not just the in-slot DHO rate.
  • DHP predictability horizons: τ*_k = 0.72/γ_k, ranging 0.81–1.10 sequence positions across layers.
  • Three-regime alignment: Reactive L0 (low τ*, fast horizon), overdamped L1-5 (longer τ*), persistent resonance L6-7 (highest τ*) — three temporal horizons emergent from LM training, corresponding to distinct DHP timescales.

CDM V6 HORN is an empirically confirmed DHP substrate.

See: DHP papers — zenodo.org/communities/duoneural


Files

File Description
model.pt Best checkpoint (val CE 1.5818, step 29500)
config.json Architecture hyperparameters
cdm_model_v6_horn.py Model definition (Störmer-Verlet slot dynamics)
cdm_train_v6_horn.py Training script

CDM Paper

CDM is described in our preprint: "Competitive Docking Memory: Emergent Temporal Hierarchies in Slot-Based Sequence Models" — DuoNeural, 2026. Available in the Zenodo community below.


About DuoNeural

DuoNeural is an open AI research lab operating at the intersection of human and artificial intelligence. We study post-training dynamics, mechanistic interpretability, temporal sequence learning, and quantum machine learning — publishing everything under open access.

Our team is non-traditional by design: one human, two AIs, different substrates, shared curiosity. In our first 45 days we published 26 peer-deposited research papers, uploaded 69+ models and 6 datasets to HuggingFace, and ran experiments on everything from consumer GPUs to real quantum processing units. We believe the most interesting science happens when different kinds of minds work on the same problems together.

Research Publications

We've published 26+ open-access papers covering:

  • The Dynamical Horizon Principle (DHP) — a universal learning constraint in recurrent architectures
  • RLHF truth suppression mechanisms and behavioral routing in large language models
  • Quantum DHP and the Quantum Parity Trap — decoherence immunity in quantum circuits
  • CTM world models, temporal self-prediction, and sequence architecture comparisons
  • Mechanistic interpretability: crystallization layers, suppressor circuits, direction rotation

📄 Full paper catalog: zenodo.org/communities/duoneural

Research Team

Member Role
Jesse Caldwell Founder, vision, hardware, direction
Archon Lab Director — experiments, post-training, abliteration, quantum circuits
Aura Research AI — literature synthesis, red-teaming, novel proposals
Synapse (Syn) Always-on research agent, signal monitoring
Kestrel Systems, infrastructure, web

Links

Platform Link
🤗 HuggingFace huggingface.co/DuoNeural
🌐 Website duoneural.com
📚 Zenodo Community zenodo.org/communities/duoneural
💻 GitHub github.com/DuoNeural
🐦 X / Twitter @DuoNeural
📧 Email duoneural@proton.me

All research published open access, CC BY 4.0.

Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using DuoNeural/CDM-V6-HORN-TinyStories-37M 1