SONAR SAEs β€” scaled-up BatchTopK

Scaled-up BatchTopK Sparse Autoencoder trained on SONAR sentence embeddings. This is the SAE used for the latent-explanation experiment in:

Interpretability of Text Auto-Encoders using Sparse Auto-Encoders: A Sandbox for Interpreting Neuralese. Nicky Pochinkov & Jason Rich Darmawan, EACL 2026 (submitted).

Configuration

Architecture BatchTopK
Input dim ($d_{\text{in}}$) 1024 (SONAR embedding)
SAE dim ($m$) 131,072
Sparsity ($k$) 64
dtype float32
Training tokens ~1.228 B (NLLB-200 primary + mined + supplemental)
LR (constant) $3\times 10^{-4} \to 3\times 10^{-5}$
Hardware 1Γ— A100, ~32 hours
SAE Lens version 6.11.0

See Table 8 of the paper for the full hyperparameter list.

Files

Each subdirectory is one wandb-tagged training run. Files are PyTorch Lightning checkpoints (epoch=N-step=K.ckpt) and include optimizer state.

  • g97mb3sb/ β€” primary scaled-up run (used in the paper)
  • pl1c7eo7/, dyfpsngy/ β€” additional scaled-up runs

Loading

import torch
import sae_lens  # version >= 6.11.0

ckpt = torch.load("g97mb3sb/last.ckpt", map_location="cpu")
state_dict = ckpt["state_dict"]
# config is embedded in ckpt["hyper_parameters"]

Related repos

Citation

@inproceedings{pochinkov2026sonarsae,
  title={Interpretability of Text Auto-Encoders using Sparse Auto-Encoders: A Sandbox for Interpreting Neuralese},
  author={Pochinkov, Nicky and Darmawan, Jason Rich},
  booktitle={Proceedings of EACL 2026},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including nickypro/sonar-saes-large