SONAR SAEs — architecture comparison

Four-variant SAE comparison set on SONAR sentence embeddings, supporting Table 1 of:

Interpretability of Text Auto-Encoders using Sparse Auto-Encoders: A Sandbox for Interpreting Neuralese. Nicky Pochinkov & Jason Rich Darmawan, EACL 2026 (submitted).

The four variants compared are: JumpReLU, Gated, Gated Normed, BatchTopK.

Configuration (shared)


Input dim ($d_{\text{in}}$)	1024 (SONAR embedding)
SAE dim ($m$)	16,384
dtype	float32
Training samples	~10M SONAR embedding vectors
Variant-specific	matched realised $L_0$ for fair comparison

Architecture-specific gate/value parameters and sparsity penalties (or $K$ for TopK) were tuned per variant to hit a comparable $L_0$ at evaluation; see Section 3 of the paper and Appendix B for the implementation-level differences (in particular how Gated Normed closes the shrink–amplify pathology of plain Gated).

Files

Each subdirectory is one wandb-tagged training run, containing PyTorch Lightning checkpoints (epoch=N-step=K.ckpt) and last.ckpt.

The wandb run IDs match those in the wandb logs repo.

Loading

import torch
ckpt = torch.load("<run_id>/last.ckpt", map_location="cpu")
# Inspect ckpt["hyper_parameters"] for the variant + config

Citation

See nickypro/sonar-saes-large.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including nickypro/sonar-saes-comparison

SONAR SAEs

Collection

Sparse Auto-Encoders for SONAR sentence embeddings, from Pochinkov & Darmawan (2025) (EACL submission). • 5 items • Updated about 12 hours ago