pythia-410m-saes-x32-l1-adaptive — Sparse Autoencoders on Pythia-410M (run_trial_2)

Sparse Autoencoder (SAE) checkpoints trained on every residual-stream layer of EleutherAI/pythia-410m, for the COLM SAE scaling-law experiments (source code on GitHub, full codebase on HF).


Base model	`EleutherAI/pythia-410m`
Layers covered	0–23 (all 24)
SAE expansion factor	32 → `F = 32,768` dictionary features per layer
Hidden dim being modeled	1024 (Pythia-410M residual stream)
L1 coefficient	initial `5e-4`, adaptive to target `L0 ≈ 150`
Tokens trained	300 M (PILE)
Snapshots per layer	6 — at 50 M, 100 M, 150 M, 200 M, 250 M tokens, plus `final`
Total files	144 `.pt` checkpoints (24 layers × 6 snapshots)

File naming

sae_layer{LL}_{SNAPSHOT}.pt

Where LL is the layer index (00–23) and SNAPSHOT is one of 50M, 100M, 150M, 200M, 250M, final.

Examples:

sae_layer00_50M.pt — layer 0, after 50 M tokens
sae_layer12_final.pt — layer 12, final checkpoint
sae_layer23_250M.pt — layer 23, 250 M tokens

Loading

import torch
from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(
    repo_id="nileshsarkar-ai/pythia-410m-saes-x32-l1-adaptive",
    filename="sae_layer12_final.pt",
)
state = torch.load(ckpt_path, map_location="cpu", weights_only=True)
# state contains the SAE encoder/decoder weights;
# see the training script in the GitHub repo for the exact module class.

Sister runs (same setup, different L1 coefficient)

This is part of a 3-run hyperparameter sweep over the SAE L1 sparsity coefficient:

run	L1 coefficient	target
pythia-410m-saes-x32-l1-adaptive	`5e-4` (adaptive)	target `L0 ≈ 150`
pythia-410m-saes-x32-l1-3e-4-fixed	`3e-4`	fixed
pythia-410m-saes-x32-l1-8e-5-fixed	`8e-5`	fixed

Reproducing

The training script lives at run_trial_2/run_trial_2.py in the source repo. Hardware used: NVIDIA A100 80 GB PCIe.

python run_trial_2.py --phase train --num_tokens 300_000_000 --expansion 32

Related artifacts

Per-layer measurement JSONs and heatmap figures: run_trial_2/results/ on GitHub.
Full backup-restore documentation: COLM_BACKUP_RESTORE.md.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nileshsarkar-ai/pythia-410m-saes-x32-l1-adaptive

Base model

EleutherAI/pythia-410m

Finetuned

(219)

this model

nileshsarkar-ai
/

pythia-410m-saes-x32-l1-adaptive