Instructions to use dams2005/olmo-3-7b-think-saes with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- SAELens
How to use dams2005/olmo-3-7b-think-saes with SAELens:
# pip install sae-lens from sae_lens import SAE sae, cfg_dict, sparsity = SAE.from_pretrained( release = "RELEASE_ID", # e.g., "gpt2-small-res-jb". See other options in https://github.com/jbloomAus/SAELens/blob/main/sae_lens/pretrained_saes.yaml sae_id = "SAE_ID", # e.g., "blocks.8.hook_resid_pre". Won't always be a hook point ) - Notebooks
- Google Colab
- Kaggle
OLMo-3-7B-Think Sparse Autoencoders
This repository contains sparse autoencoders (SAEs) trained on post-layer
residual stream activations from allenai/Olmo-3-7B-Think.
Each folder is packaged for SAELens with cfg.json and
sae_weights.safetensors. The original training checkpoints included
optimizer state; this release contains inference-only weights.
Code
Training and collection code is available at daMS2005/olmo-sae-builder.
Model Details
- Base model:
allenai/Olmo-3-7B-Think - Activation type: post-layer residual stream
- Activation width:
4096 - Hook convention: layer
Lusesoutputs.hidden_states[L + 1]from Hugging Face Transformers - Collection context length: mixed-length chunks up to
8192tokens - Training data source: Dolma
v1_6-sampleplus Dolci-Instruct-SFT text - Architecture: tied TopK sparse autoencoder
- Decoder/encoder directions are exported in SAELens tensor format
Data Collection
Activations were collected with the local collect_data.py pipeline:
- Downloaded all
103Dolmav1_6-sampleJSONL gzip shards fromallenai/dolma. - Downloaded all
15Dolci-Instruct-SFT train parquet shards fromallenai/Dolci-Instruct-SFT. - Streamed local files from disk row by row; the full text dataset was not loaded into RAM or VRAM.
- Tokenized text on CPU, inserted EOS separators, and packed mixed-length chunks.
- Used length buckets
(512, 2048),(3500, 5000), and(7000, 8192)tokens. - Ran
allenai/Olmo-3-7B-Thinkinbfloat16withtorch.no_grad(),eval(), anduse_cache=False. - Captured
olmo.model.layers[L]outputs for layers7,15,23, and31. - This is equivalent to using
outputs.hidden_states[L + 1]from Hugging Face Transformers. - Removed padding positions with the attention mask before saving activations.
- Saved activation tensors as float16 chunks with shape
[num_real_tokens, 4096]. - The release training runs used
380activation chunks per layer, about20.0Mtoken activations.
Recommended SAEs
Use these unless you specifically want to compare raw versus layernorm variants:
layer_07/width_131k/topk_64_rawlayer_15/width_163k/topk_64_rawlayer_23/width_163k/topk_128_rawlayer_31/width_163k/topk_128_raw
Layer 31 SAE
layer_31/width_163k/topk_128_raw is the final-layer SAE in this release.
- Layer:
31 - Latents:
163840 - TopK:
128 - Input normalization:
none(raw activations) - Batch size:
256 - Learning rate:
3e-4 - Final training mean normalized MSE:
0.326502
Validation after export:
SAE.from_pretrained(...)loaded successfully through SAELensencode()anddecode()ran on CPU and CUDA- Exported SAELens tensors matched the original training checkpoint with max absolute difference
0.0 - A small real-activation smoke test on 256 layer-31 vectors produced finite reconstructions
All Included SAEs
| SAE id | layer | width | k | norm | final normalized MSE | canonical |
|---|---|---|---|---|---|---|
layer_07/width_131k/topk_64_raw |
7 | 131072 | 64 | none | 0.274970 | yes |
layer_07/width_163k/topk_64_layernorm |
7 | 163840 | 64 | layer_norm | 0.277127 | |
layer_15/width_163k/topk_64_raw |
15 | 163840 | 64 | none | 0.303301 | yes |
layer_15/width_163k/topk_64_layernorm |
15 | 163840 | 64 | layer_norm | 0.304524 | |
layer_23/width_163k/topk_64_raw |
23 | 163840 | 64 | none | 0.343949 | |
layer_23/width_163k/topk_128_raw |
23 | 163840 | 128 | none | 0.329776 | yes |
layer_31/width_163k/topk_128_raw |
31 | 163840 | 128 | none | 0.326502 | yes |
The layernorm variants use SAELens runtime normalize_activations="layer_norm".
The raw variants reconstruct raw activation vectors directly.
SAE Sections
layer_07/width_131k/topk_64_raw
Layer 7 raw-activation TopK64 SAE; best layer 7 run by final loss.
- Status: recommended canonical SAE
- Layer:
7 - SAE width:
131072latents - Input width:
4096 - TopK:
64 - Activation normalization:
none - Final training mean normalized MSE:
0.274970 - Source checkpoint:
dams2005/olmo-3-7b-think-sae-layer-07/raw_prebias_131k_topk64/olmo_sae_layer07.pt
Raw residual stream activations; no runtime input normalization.
layer_07/width_163k/topk_64_layernorm
Layer 7 TopK64 SAE trained with runtime layernorm preprocessing.
- Status: comparison variant
- Layer:
7 - SAE width:
163840latents - Input width:
4096 - TopK:
64 - Activation normalization:
layer_norm - Final training mean normalized MSE:
0.277127 - Source checkpoint:
dams2005/olmo-3-7b-think-sae-layer-07/olmo_sae_layer07.pt
Uses SAELens runtime layer normalization before encoding and reverses it after decoding.
layer_15/width_163k/topk_64_raw
Layer 15 raw-activation TopK64 SAE; best layer 15 run by final loss.
- Status: recommended canonical SAE
- Layer:
15 - SAE width:
163840latents - Input width:
4096 - TopK:
64 - Activation normalization:
none - Final training mean normalized MSE:
0.303301 - Source checkpoint:
dams2005/olmo-3-7b-think-sae-layer-15/raw_prebias_163k_topk64/olmo_sae_layer15.pt
Raw residual stream activations; no runtime input normalization.
layer_15/width_163k/topk_64_layernorm
Layer 15 TopK64 SAE trained with runtime layernorm preprocessing.
- Status: comparison variant
- Layer:
15 - SAE width:
163840latents - Input width:
4096 - TopK:
64 - Activation normalization:
layer_norm - Final training mean normalized MSE:
0.304524 - Source checkpoint:
dams2005/olmo-3-7b-think-sae-layer-15/olmo_sae_layer15.pt
Uses SAELens runtime layer normalization before encoding and reverses it after decoding.
layer_23/width_163k/topk_64_raw
Layer 23 raw-activation TopK64 SAE; kept for same-k comparisons.
- Status: comparison variant
- Layer:
23 - SAE width:
163840latents - Input width:
4096 - TopK:
64 - Activation normalization:
none - Final training mean normalized MSE:
0.343949 - Source checkpoint:
dams2005/olmo-3-7b-think-sae-layer-23/raw_prebias_163k_topk64/olmo_sae_layer23.pt
Raw residual stream activations; no runtime input normalization.
layer_23/width_163k/topk_128_raw
Layer 23 raw-activation TopK128 SAE; best layer 23 run by final loss.
- Status: recommended canonical SAE
- Layer:
23 - SAE width:
163840latents - Input width:
4096 - TopK:
128 - Activation normalization:
none - Final training mean normalized MSE:
0.329776 - Source checkpoint:
dams2005/olmo-3-7b-think-sae-layer-23/raw_prebias_163k_topk128/olmo_sae_layer23.pt
Raw residual stream activations; no runtime input normalization.
layer_31/width_163k/topk_128_raw
Layer 31 raw-activation TopK128 SAE.
- Status: recommended canonical SAE
- Layer:
31 - SAE width:
163840latents - Input width:
4096 - TopK:
128 - Activation normalization:
none - Final training mean normalized MSE:
0.326502 - Source checkpoint:
dams2005/olmo-3-7b-think-sae-layer-31/raw_prebias_163k_topk128/olmo_sae_layer31.pt
Raw residual stream activations; no runtime input normalization.
SAELens Usage
from sae_lens import SAE
repo_id = "dams2005/olmo-3-7b-think-saes"
sae_id = "layer_31/width_163k/topk_128_raw"
sae = SAE.from_pretrained(repo_id, sae_id, device="cuda", dtype="float32")
Hugging Face Transformers Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from sae_lens import SAE
model_id = "allenai/Olmo-3-7B-Think"
layer = 31
sae_id = "layer_31/width_163k/topk_128_raw"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
dtype=torch.bfloat16,
device_map="auto",
)
sae = SAE.from_pretrained("dams2005/olmo-3-7b-think-saes", sae_id, device="cuda")
tokens = tokenizer("The Eiffel Tower is in", return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model(**tokens, output_hidden_states=True)
acts = outputs.hidden_states[layer + 1].to("cuda").float()
feature_acts = sae.encode(acts)
recons = sae.decode(feature_acts)
hidden_states[0] is the embedding output, so layer L activations are
hidden_states[L + 1].
Intended Use
These SAEs are intended for mechanistic interpretability research: feature inspection, activation analysis, and reconstruction experiments over OLMo 3 residual stream activations. They are not language models, classifiers, or safety filters.
Limitations
- The SAEs were trained for one epoch over the collected activation set.
- Reconstruction quality varies by layer; later-layer activations were harder to reconstruct.
- Features are not labeled or curated in this release.
- The layernorm variants change the SAE preprocessing basis at runtime; prefer the raw canonical variants for direct raw-residual analysis.
Files
manifest.json: machine-readable list of included SAEs and training metricssaelens_pretrained_saes.yaml: registry-style snippet for possible SAELens registration*/cfg.json: SAELens config for each SAE*/sae_weights.safetensors: SAELens weights for each SAE*/training.log: training log for the source run
Model tree for dams2005/olmo-3-7b-think-saes
Base model
allenai/Olmo-3-1025-7B