SONAR SAEs — architecture comparison

Four-variant SAE comparison set on SONAR sentence embeddings, supporting Table 1 of:

Interpretability of Text Auto-Encoders using Sparse Auto-Encoders: A Sandbox for Interpreting Neuralese. Nicky Pochinkov & Jason Rich Darmawan, EACL 2026 (submitted).

The four variants compared are: JumpReLU, Gated, Gated Normed, BatchTopK.

Configuration (shared)

Input dim ($d_{\text{in}}$) 1024 (SONAR embedding)
SAE dim ($m$) 16,384
dtype float32
Training samples ~10M SONAR embedding vectors
Variant-specific matched realised $L_0$ for fair comparison

Architecture-specific gate/value parameters and sparsity penalties (or $K$ for TopK) were tuned per variant to hit a comparable $L_0$ at evaluation; see Section 3 of the paper and Appendix B for the implementation-level differences (in particular how Gated Normed closes the shrink–amplify pathology of plain Gated).

Files

Each subdirectory is one wandb-tagged training run, containing PyTorch Lightning checkpoints (epoch=N-step=K.ckpt) and last.ckpt.

The wandb run IDs match those in the wandb logs repo.

Loading

import torch
ckpt = torch.load("<run_id>/last.ckpt", map_location="cpu")
# Inspect ckpt["hyper_parameters"] for the variant + config

Citation

See nickypro/sonar-saes-large.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including nickypro/sonar-saes-comparison