EBT × spectral control — load & play

Companion artifacts for "Replacing EBT's stability heuristics with principled spectral control" (toy + small-transformer study of Energy-Based Transformers, Gladstone et al. arXiv:2507.02092).

A real (tiny, 1.26M-param) causal-transformer EBT: per-token energy E(h_t, ŷ) over a continuous C-dim candidate, inner gradient descent on ŷ, second-order training, TinyStories-BPE (vocab 4096). λmax for the adaptive inner step is estimated by power-iteration on the HVP — no exact Hessian.

Checkpoints

file recipe val CE (random = 8.32)
ebt_baseline.pt esharp=1 (well-posed), plain ~5
ebt_naked_esharp8.pt sharpened energy init, NO control ~350 (diverged — the failure)
ebt_ours_esharp8.pt same sharpened init + spectral control (α=c/λmax, power-iteration) ~5.7 (recovers baseline)

Same sharpened landscape: naked/Langevin/clamp all diverge (227–381); only the spectral control trains. The point: EBT's stability heuristics are stand-ins for landscape conditioning; α·λ<2 replaces them with a guarantee — provable, tuning-free (replaces the randomized-step-size heuristic).

Load & play

See the companion notebook (EBT_spectral_control.ipynb): downloads these checkpoints, evaluates val CE, races adaptive-α vs fixed-α inner optimization, and generates text token-by-token so you can watch the per-token "thinking" (inner optimization) — comparing the controlled vs uncontrolled model.

Data: val.bin (8 MB TinyStories-BPE val shard, uint16) + tokenizer.json (BPE, vocab 4096). Training script: ebt_small.py (e.g. python ebt_small.py --esharp 8 --adapt_c 1.0 --diag_speed). 2D toy with exact 2×2 Hessian: toy_ebt.py.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for blackhao0426/ebt-spectral-control