SONAR SAEs — architecture comparison
Four-variant SAE comparison set on SONAR sentence embeddings, supporting Table 1 of:
Interpretability of Text Auto-Encoders using Sparse Auto-Encoders: A Sandbox for Interpreting Neuralese. Nicky Pochinkov & Jason Rich Darmawan, EACL 2026 (submitted).
The four variants compared are: JumpReLU, Gated, Gated Normed, BatchTopK.
Configuration (shared)
| Input dim ($d_{\text{in}}$) | 1024 (SONAR embedding) |
| SAE dim ($m$) | 16,384 |
| dtype | float32 |
| Training samples | ~10M SONAR embedding vectors |
| Variant-specific | matched realised $L_0$ for fair comparison |
Architecture-specific gate/value parameters and sparsity penalties (or
$K$ for TopK) were tuned per variant to hit a comparable $L_0$ at
evaluation; see Section 3 of the paper and Appendix B for the
implementation-level differences (in particular how Gated Normed
closes the shrink–amplify pathology of plain Gated).
Files
Each subdirectory is one wandb-tagged training run, containing
PyTorch Lightning checkpoints (epoch=N-step=K.ckpt) and last.ckpt.
The wandb run IDs match those in the wandb logs repo.
Loading
import torch
ckpt = torch.load("<run_id>/last.ckpt", map_location="cpu")
# Inspect ckpt["hyper_parameters"] for the variant + config