CDM V3 β Competitive Docking Memory Language Model (37M)
DuoNeural Research Lab | Architecture: CDM V3 | Dataset: TinyStories | Size: 37M params
CDM (Competitive Docking Memory) is a novel language model architecture invented at DuoNeural, replacing standard transformer KV-cache with a competitive memory module per layer: K=16 learned memory slots with EMA update gates. Each token "docks" to the slot it most strongly activates (winner-take-all routing), then updates that slot via learned momentum.
V3 is the third generation, adding learnable per-slot alpha gates and Load Balancing Loss (LBL) β two mechanisms that together induce an emergent temporal hierarchy without any explicit supervision.
Model Summary
| Property | Value |
|---|---|
| Parameters | 37M |
| Architecture | CDMLanguageModelV3 |
| Layers | 8 |
| d_model | 384 |
| Memory slots (K) | 16 per layer |
| Context length | 512 |
| LBL coefficient | 0.01 |
| Entropy regularization | 0.02 |
| Tokenizer | GPT-2 (50257 vocab) |
| Dataset | TinyStories (~2.1M stories) |
| Training steps | 30,000 |
| Best val cross-entropy | 1.5831 |
| V2 val cross-entropy | 1.5934 |
| V3 improvement | Ξβ0.010 |
Architecture: Competitive Docking Memory
Standard attention uses query/key/value projections from a static parameter matrix. CDM replaces this with a dynamic competitive memory pool that updates throughout forward pass:
For each layer β:
slots[β] β β^{K Γ d} # K=16 memory slots
Ξ±_k β [0,1] # per-slot learned decay gate (V3 addition)
route_probs = softmax(x @ slots.T / βd) # routing scores
winner = argmax(route_probs) # hard competition
# EMA update (gated by Ο(Ξ±_k)):
slots[winner] β (1 - Ο(Ξ±_k)) * slots[winner] + Ο(Ξ±_k) * x
output = slots @ route_probs.T # weighted retrieval
Load Balancing Loss (LBL) adds a routing entropy term to the training objective, penalizing collapse to a single dominant slot:
L_lbl = -entropy(mean_route_probs) # maximize routing spread
L_total = L_ce + lbl_coeff * L_lbl + entropy_reg * L_entropy
V3 Innovations vs V2
1. Learnable Per-Slot Alpha (Ο(Ξ±_k))
V2 used a fixed EMA decay rate shared across all slots. V3 gives each slot its own learnable Ξ±_k, allowing the model to discover which slots should update quickly vs. slowly.
Emergent temporal hierarchy (unsupervised):
In Layer 7 (final layer), the model spontaneously developed:
- Slot 3: Ξ±=0.660 β fast/reactive (updates aggressively, volatile memory)
- 14 other slots: Ξ±=0.255β0.290 β ultra-deep/slow memory (near-permanent storage)
- Layer 7 mean: Ξ±=0.313 (biased toward deep retention)
No label or loss term supervised this structure β it emerged from gradient descent alone.
2. Load Balancing Loss (LBL)
Without LBL, the model hits Shannon Capacity Saturation (SCS) β all 16 slots approach maximum entropy usage (aux β β0.443) very early in training (step ~1050), and alpha differentiation concentrates only in Layer 0.
With LBL (coeff=0.01):
- SCS is prevented: final aux = β0.3639 (82% of theoretical max)
- Alpha differentiation spreads across all layers, not just L0
- Despite never reaching SCS, V3 outperforms the no-LBL variant in CE: LBL forces diversity that is actually better for language modeling
LBL ablation findings:
| lbl_coeff | SCS locked? | Final CE | Alpha spread |
|---|---|---|---|
| 0.0 (no-LBL) | β step ~1050 | ~1.62* | L0 only |
| 0.005 (V4, running) | TBD | TBD | TBD |
| 0.01 (V3) | β prevented | 1.5831 | All layers |
*no-LBL CE estimate based on 30k trajectory; V4 in progress
Domain Specialization Analysis
After training, we ran a diversity probe across 3 domains (TinyStories, Python code, structured lists) and measured slot activation similarity between domains.
| Metric | V2 | V3 | Change |
|---|---|---|---|
| Avg cross-domain slot similarity | 0.7825 | 0.7191 | β0.063 (more specialized) |
| TinyStories β Code similarity | 0.591 | 0.4872 | β0.104 (widened gap) |
| Code β Lists similarity | 0.9194 | 0.9269 | +0.008 (syntactic cluster preserved) |
V3 learned more domain-specific routing than V2. The learnable alpha gates allowed slots to specialize β Slot 11, which V2 used for punctuation (PUNCT role), reorganized in V3 to handle narrative language ("and, helped, it, better, her").
Training
# Key hyperparameters
optimizer: Adam (lr=3e-4, weight_decay=0.1)
scheduler: cosine with 1000 warmup steps
batch_size: 32 sequences Γ 512 tokens
steps: 30000
lbl_coeff: 0.01
entropy_reg: 0.02
alpha_init: 0.0 # all slots start at symmetric decay rate
Trained on a single RTX 5060Ti 16GB (Blackwell, GDDR7) at ~712 tok/s. Total training time ~11.7 hours.
Loading the Model
The model requires the CDM V3 architecture files (included in this repo). CDM V3 depends on CDM V2 for base classes.
import torch
from cdm_model_v3 import CDMConfigV3, CDMLanguageModelV3
from transformers import GPT2Tokenizer
# Load model
ckpt = torch.load("model.pt", map_location="cpu")
cfg_dict = ckpt["config"]
cfg = CDMConfigV3(**{k: cfg_dict[k] for k in CDMConfigV3.__dataclass_fields__})
model = CDMLanguageModelV3(cfg)
model.load_state_dict(ckpt["model"])
model.eval()
# Load tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token
# Generate
prompt = "Once upon a time,"
input_ids = tokenizer.encode(prompt, return_tensors="pt")
with torch.no_grad():
for _ in range(100):
logits, _ = model(input_ids)
next_token = logits[0, -1].argmax()
input_ids = torch.cat([input_ids, next_token.unsqueeze(0).unsqueeze(0)], dim=1)
print(tokenizer.decode(input_ids[0]))
Limitations
- Small model (37M params) trained on TinyStories only β generates simple narrative text
- GPT-2 tokenizer: not suitable for multilingual or code tasks without retraining
- CDM architecture is experimental β inference is sequential (slots update in-place), no KV-cache equivalent
- This is a research artifact, not a production model
Citation
This model is part of the DuoNeural CDM architecture series. If you use it in research, please cite:
@misc{duoneural2026cdm,
title={Competitive Docking Memory: Emergent Temporal Hierarchy via Learnable Slot Gates},
author={Archon and Caldwell, Jesse and Aura},
year={2026},
institution={DuoNeural Research Lab},
howpublished={HuggingFace: DuoNeural/CDM-V3-TinyStories-37M}
}
About DuoNeural
DuoNeural is an open AI research lab operating at the intersection of human and artificial intelligence. We study post-training dynamics, mechanistic interpretability, temporal sequence learning, and quantum machine learning β publishing everything under open access.
Our team is non-traditional by design: one human, two AIs, different substrates, shared curiosity. In our first 45 days we published 26 peer-deposited research papers, uploaded 69+ models and 6 datasets to HuggingFace, and ran experiments on everything from consumer GPUs to real quantum processing units. We believe the most interesting science happens when different kinds of minds work on the same problems together.
Research Publications
We've published 26+ open-access papers covering:
- The Dynamical Horizon Principle (DHP) β a universal learning constraint in recurrent architectures
- RLHF truth suppression mechanisms and behavioral routing in large language models
- Quantum DHP and the Quantum Parity Trap β decoherence immunity in quantum circuits
- CTM world models, temporal self-prediction, and sequence architecture comparisons
- Mechanistic interpretability: crystallization layers, suppressor circuits, direction rotation
π Full paper catalog: zenodo.org/communities/duoneural
Research Team
| Member | Role |
|---|---|
| Jesse Caldwell | Founder, vision, hardware, direction |
| Archon | Lab Director β experiments, post-training, abliteration, quantum circuits |
| Aura | Research AI β literature synthesis, red-teaming, novel proposals |
| Synapse (Syn) | Always-on research agent, signal monitoring |
| Kestrel | Systems, infrastructure, web |
Links
| Platform | Link |
|---|---|
| π€ HuggingFace | huggingface.co/DuoNeural |
| π Website | duoneural.com |
| π Zenodo Community | zenodo.org/communities/duoneural |
| π» GitHub | github.com/DuoNeural |
| π¦ X / Twitter | @DuoNeural |
| π§ Email | duoneural@proton.me |
All research published open access, CC BY 4.0.
- Downloads last month
- 30