pythia-410m-saes-x32-l1-3e-4-fixed β€” Sparse Autoencoders on Pythia-410M (run_exp_2_t1)

Per-layer metrics heatmap

Sparse Autoencoder (SAE) checkpoints trained on every residual-stream layer of EleutherAI/pythia-410m, for the COLM SAE scaling-law experiments (source code on GitHub, full codebase on HF).

Training curves Loss-floor predictions
Training curves across all 24 layers Predicted vs measured loss floor

Contents

Base model EleutherAI/pythia-410m
Layers covered 0–23 (all 24)
SAE expansion factor 32 β†’ F = 32,768 dictionary features per layer
Hidden dim being modeled 1024 (Pythia-410M residual stream)
L1 coefficient 3e-4 (fixed)
Tokens trained 300 M (PILE)
Snapshots per layer 6 β€” at 50 M, 100 M, 150 M, 200 M, 250 M tokens, plus final
Total files 144 .pt checkpoints (24 layers Γ— 6 snapshots)

File naming

sae_layer{LL}_{SNAPSHOT}.pt

Where LL is the layer index (00–23) and SNAPSHOT is one of 50M, 100M, 150M, 200M, 250M, final.

Examples:

  • sae_layer00_50M.pt
  • sae_layer12_final.pt
  • sae_layer23_250M.pt

Loading

import torch
from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(
    repo_id="nileshsarkar-ai/pythia-410m-saes-x32-l1-3e-4-fixed",
    filename="sae_layer12_final.pt",
)
state = torch.load(ckpt_path, map_location="cpu", weights_only=True)

Sister runs (same setup, different L1 coefficient)

run L1 coefficient target
pythia-410m-saes-x32-l1-adaptive 5e-4 (adaptive) target L0 β‰ˆ 150
pythia-410m-saes-x32-l1-3e-4-fixed 3e-4 fixed
pythia-410m-saes-x32-l1-8e-5-fixed 8e-5 fixed

Reproducing

Training script at run_exp_2_t1/run_exp.py in the source repo. Hardware: NVIDIA A100 80 GB PCIe.

python run_exp.py --phase train --num_tokens 300_000_000 --expansion 32 --l1_coeff 3e-4

Related artifacts

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for nileshsarkar-ai/pythia-410m-saes-x32-l1-3e-4-fixed

Finetuned
(219)
this model