EBT × spectral control — load & play

Companion artifacts for "Replacing EBT's stability heuristics with principled spectral control" (toy + small-transformer study of Energy-Based Transformers, Gladstone et al. arXiv:2507.02092).

A real (tiny, 1.26M-param) causal-transformer EBT: per-token energy E(h_t, ŷ) over a continuous C-dim candidate, inner gradient descent on ŷ, second-order training, TinyStories-BPE (vocab 4096). λmax for the adaptive inner step is estimated by power-iteration on the HVP — no exact Hessian.

Checkpoints

file	recipe	val CE (random = 8.32)
`ebt_baseline.pt`	esharp=1 (well-posed), plain	~5
`ebt_naked_esharp8.pt`	sharpened energy init, NO control	~350 (diverged — the failure)
`ebt_ours_esharp8.pt`	same sharpened init + spectral control (α=c/λmax, power-iteration)	~5.7 (recovers baseline)

Same sharpened landscape: naked/Langevin/clamp all diverge (227–381); only the spectral control trains. The point: EBT's stability heuristics are stand-ins for landscape conditioning; α·λ<2 replaces them with a guarantee — provable, tuning-free (replaces the randomized-step-size heuristic).

Load & play

See the companion notebook (EBT_spectral_control.ipynb): downloads these checkpoints, evaluates val CE, races adaptive-α vs fixed-α inner optimization, and generates text token-by-token so you can watch the per-token "thinking" (inner optimization) — comparing the controlled vs uncontrolled model.

Data: val.bin (8 MB TinyStories-BPE val shard, uint16) + tokenizer.json (BPE, vocab 4096). Training script: ebt_small.py (e.g. python ebt_small.py --esharp 8 --adapt_c 1.0 --diag_speed). 2D toy with exact 2×2 Hessian: toy_ebt.py.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for blackhao0426/ebt-spectral-control

Energy-Based Transformers are Scalable Learners and Thinkers

Paper • 2507.02092 • Published Jul 2, 2025 • 70