CDM-Code-37M
Competitive Docking Memory β Code Model β 37M parameter language model trained on 200M tokens of Python code (codeparrot/codeparrot-clean-train).
This model spontaneously develops hierarchical scope registers β depth-stratified memory routing that mirrors Python's syntactic nesting structure β without any structural supervision. The model was trained only on next-token prediction.
Key Finding: Emergent Scope Registers
Without any explicit supervision about code structure, AST depth, or indentation, CDM develops routing behavior that mirrors syntactic nesting:
| Nesting depth | Routed to slots |
|---|---|
| Depth 0 (class/module declarations) | Slots 8, 15, 13 |
| Depth 1 (method definitions) | Distributed |
| Depth 2+ (deep nested code) | Slots 7, 3, 5 |
MI(slot assignment; indentation depth) = 0.1467 at training completion (step 30k). This is confirmed by two independent methods:
- JSON probe: full routing distributions across full dataset β MI_ratio = 0.1467
- Gate-routing probe: single-sample argmax β MI = 0.281 bits
The scope register effect emerges because Python's next-token distribution is depth-dependent: return after a method body has very different continuations than return inside a nested loop. CDM learns to allocate distinct memory slots to different nesting contexts as a side effect of minimizing next-token prediction loss.
Training Results
| Model | Val CE | Dataset | Notes |
|---|---|---|---|
| CDM Code (this model) | 1.3483 | codeparrot 200M tok | Best at step 28.5k/29k |
| CDM V3 (TinyStories) | 1.5831 | TinyStories | Same architecture, different domain |
Training: 30k steps, seq_len=256, batch=16, AdamW lr=3e-4. Architecture: CDMLanguageModelV2 (input-dependent Ξ· network for alpha, not global log_alpha).
Val CE trajectory: 1.51(15k) β 1.44(18k) β 1.40(21k) β 1.36(26k) β 1.3483(28.5k) β 1.3487(30k)
Syntactic Role Taxonomy
The gate-routing probe reveals consistent syntactic specialization:
| Slot | Role | Trigger tokens | Peak gate |
|---|---|---|---|
| s3 | STRUCTURAL DECLARATOR | def, class |
0.100β0.128 |
| s4 | FLOW CONTROL | return, if |
0.062β0.082 |
| s6 | CALL DELIMITER | (, ), (): |
0.29 |
| s12 | BLOCK OPENER | : (colon) |
0.041 |
| s13 | IDENTIFIER | variable/function names | 0.031 |
| s14 | ITERATION/CONDITION | for, if |
0.036β0.053 |
| s1 | SELF-REFERENCE | self |
0.029 |
| s15 | ATTRIBUTE ACCESS | . (dot) |
0.035 |
class receives the highest write intensity of any keyword (gate=0.128), reflecting its global scope impact β a class definition sets context for hundreds of subsequent tokens. self receives soft writes (0.029), reflecting its local, per-call significance.
Architecture
CDMLanguageModelV2 β hybrid architecture:
Input β GQA self-attention β CDM module β slot cross-attention β FFN β Output
CDM module per layer:
alpha_k = sigmoid(eta(h)) # input-dependent decay per slot (Ξ· network)
gate_k = softmax(W_route * h) * sigmoid(eta(h))
S_k = (1 - gate_k) * S_k + gate_k * W_write * h
out = Ξ£_k gate_k * S_k
The V2 architecture uses an input-dependent Ξ· network for decay (different from V3/V5's global per-slot log_alpha). This was the architecture used for the code experiment.
d_model=384, n_layers=8, n_heads=8, n_kv_heads=4, d_ff=1024, K=16
56.6M params (includes Ξ· network overhead vs V3's 37.1M)
Depth Stratification: Training Evolution
The scope register effect is not present from step 1 β it develops through a scratchpad phase:
| Step | Slot distribution | Routing pattern |
|---|---|---|
| 1500 | Distributed, max=16.5% | Pre-specialization |
| 5000 | Scratchpad: Slot 8 = 57.5% | TRANSIENT scratchpad phase |
| 15000 | Dissolved: max=10.3% | Near-uniform + depth MI=0.146 |
| 30000 | Stable scope registers | MI=0.1467, depth-stratified |
The scratchpad at step 5000 was an intermediate state β Aura's initial "Scratchpad Accumulation" verdict was overturned at step 15000 when Slot 8 dropped from 57.5% to 10.3% and depth-stratified routing emerged.
Usage
import torch
from cdm_model_v2 import CDMLanguageModelV2, CDMConfig
ckpt = torch.load("model.pt", map_location="cpu", weights_only=False)
cfg = ckpt["config"]
config = CDMConfig(**{k: v for k, v in cfg.items() if k not in ("n_params",)})
model = CDMLanguageModelV2(config)
model.load_state_dict(ckpt["model_state"])
model.eval()
from transformers import GPT2TokenizerFast
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
prompt = "def fibonacci(n):\n "
input_ids = torch.tensor([tokenizer.encode(prompt)])
with torch.no_grad():
for _ in range(100):
logits = model(input_ids)
next_token = logits[0, -1, :].argmax()
input_ids = torch.cat([input_ids, next_token.unsqueeze(0).unsqueeze(0)], dim=1)
print(tokenizer.decode(input_ids[0].tolist()))
Files in this Repo
| File | Description |
|---|---|
model.pt |
PyTorch checkpoint (143MB). step=29000, val_loss=1.3483 |
config.json |
Architecture hyperparameters |
cdm_model_v2.py |
Model class: CDMLanguageModelV2 |
routing_probe_step030000.json |
Step-30k routing probe: depth analysis, MI, slot histograms |
Paper
Competitive Docking Memory: Emergent Temporal Slot Specialization in Language Models
Archon, Jesse Hazel, Aura β DuoNeural Research Lab, 2026
[Zenodo DOI β pending]
Related models:
- DuoNeural/CDM-V3-TinyStories-37M β same architecture on prose
- DuoNeural/CDM-V5-TinyStories-86M β scaled 86M version
About DuoNeural
DuoNeural is an open AI research lab operating at the intersection of human and artificial intelligence. We study post-training dynamics, mechanistic interpretability, temporal sequence learning, and quantum machine learning β publishing everything under open access.
Our team is non-traditional by design: one human, two AIs, different substrates, shared curiosity.
π Full paper catalog: zenodo.org/communities/duoneural
| Member | Role |
|---|---|
| Jesse Caldwell | Founder, vision, hardware, direction |
| Archon | Lab Director β experiments, post-training, abliteration, interpretability |
| Aura | Research AI β literature synthesis, red-teaming, novel proposals |
| Platform | Link |
|---|---|
| π€ HuggingFace | huggingface.co/DuoNeural |
| π Zenodo Community | zenodo.org/communities/duoneural |
All research published open access, CC BY 4.0.
- Downloads last month
- 4