SAE: Gemma-2-2B Layer 12 Residual Stream (v9c)

TopK Sparse Autoencoder trained on the residual stream after layer 12 of google/gemma-2-2b. Used in the LessWrong/AF post A sparse-feature audit of induction in Gemma-2-2B: GitHub · interactive dashboard.

Quick facts


Architecture	TopK SAE
Hook	`blocks.12.hook_resid_post`
`d_in`	2,304
`d_sae`	16,384
L0 / k	100
Training tokens	200M
Dataset	`monology/pile-uncopyrighted` (BOS-excluded)
Library	`saprmarks/dictionary_learning` 0.1.0; converted to SAELens 6.43.0 format
Final explained variance	0.85 (peak 0.893)
Dead features	0
Hardware	Single RTX 5070 Ti (16 GB)

Loading

from sae_lens.saes.sae import SAE

sae = SAE.load_from_disk(
    "sohumsen/sae-gemma2-2b-layer12-v9c",   # downloads from HF
    device="cuda",
)

Or download files manually with huggingface_hub.snapshot_download and pass the local path to SAE.load_from_disk.

What this SAE is for

It decomposes Gemma-2-2B's layer-12 residual stream into 16,384 named, monosemantic features. Of those, ~100 are causally implicated in induction-style in-context learning (predicting B after seeing A B ... A). The top induction feature, F15289, fires on the second occurrence of a repeated word ("Never...Never", "Tier...Tier", ...).

For the full story — feature ranking, head-correspondence ablations, library-comparison notes (SAELens TopK plateaus on this task; dictionary_learning does not) — see the GitHub repo.

License

MIT, same as the source repository.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for senator1/sae-gemma2-2b-layer12-v9c

Base model

google/gemma-2-2b

Finetuned

(555)

this model