Parameter-trajectory crosscoders for vocabulary readout evolution
Trained-dictionary release for Learning to Read Out: Unembedding Dynamics in Language Model Pretraining. We train snapshot crosscoders on parameter tensors (rather than activations) sampled across pretraining checkpoints. In the output unembedding $W_U$ this reveals how a sparse vocabulary readout forms, reorganizes, and becomes load-bearing during pretraining.
Code, figure-by-figure reproduction map, and retraining recipes:
https://github.com/hematteo/learning-to-read-out (see docs/REPRODUCE.md
and docs/DATA.md; per-run settings of record in configs/runs/).
Quick start
# everything (~180 GB)
hf download matteohe/parameter-trajectory-crosscoders --local-dir $UM_SSD_ROOT/hf_release/parameter-trajectory-crosscoders
# one model only
hf download matteohe/parameter-trajectory-crosscoders --include "pythia-1b/**" --local-dir ...
Each artifact is <name>.safetensors + <name>.config.json (training
hyperparameters and recomputed quality metrics) + <name>.md (card).
index.json is the machine-readable inventory of everything below.
What you probably want
| What | Path |
|---|---|
| Headline 5-seed Pythia-160M $W_U$ crosscoder | pythia-160m/W_U/cross-snapshot-32/d8192/seed{0..4}.safetensors |
| High-resolution 160M instrument (atlas) | pythia-160m/W_U/cross-snapshot-32/d24576/seed0.safetensors |
| Cross-scale (Pythia-1B) | pythia-1b/W_U/cross-snapshot-32/d24576/seed0.safetensors |
| Large-scale, selected sparse run | pythia-6.9b/W_U/cross-snapshot-32/d32768/seed0-sparse.safetensors |
| Cross-family (OLMo-2-7B) | olmo-2-7b/W_U/cross-snapshot-32/d32768/seed0.safetensors |
| Read/write asymmetry ($W_E$ side) | pythia-160m/W_E/cross-snapshot-32/... |
| Activation-rate aggregates (lifecycle figures) | derived/aggregates/, derived/rates/ |
| Attribution-patching artifacts | attribution/pythia-160m/ |
| Held-out eval token corpus | evaluation/eval-corpus/eval_tokens.pt |
Full inventory
| Path | Model | Matrix | Kind | d_sae | Seed | Quality |
|---|---|---|---|---|---|---|
olmo-2-7b/W_U/cross-snapshot-32/d32768/seed0.safetensors |
allenai/OLMo-2-1124-7B | W_U | cross-snapshot-32 | 32768 | 0 | EV 0.853 / L0 557 |
pythia-160m/W_E/cross-snapshot-32/d24576/seed0.safetensors |
EleutherAI/pythia-160m | W_E | cross-snapshot-32 | 24576 | 0 | EV 0.831 / L0 118 |
pythia-160m/W_E/cross-snapshot-32/d8192/seed0.safetensors |
EleutherAI/pythia-160m | W_E | cross-snapshot-32 | 8192 | 0 | EV 0.581 / L0 82 |
pythia-160m/W_E/cross-snapshot-32/d8192/seed1.safetensors |
EleutherAI/pythia-160m | W_E | cross-snapshot-32 | 8192 | 1 | EV 0.580 / L0 82 |
pythia-160m/W_E/cross-snapshot-32/d8192/seed2.safetensors |
EleutherAI/pythia-160m | W_E | cross-snapshot-32 | 8192 | 2 | EV 0.582 / L0 82 |
pythia-160m/W_E/cross-snapshot-32/d8192/seed3.safetensors |
EleutherAI/pythia-160m | W_E | cross-snapshot-32 | 8192 | 3 | EV 0.581 / L0 82 |
pythia-160m/W_E/cross-snapshot-32/d8192/seed4.safetensors |
EleutherAI/pythia-160m | W_E | cross-snapshot-32 | 8192 | 4 | EV 0.583 / L0 83 |
pythia-160m/W_U/architecture-comparison/d8192/batchtopk/seed0.safetensors |
EleutherAI/pythia-160m | W_U | architecture-comparison/d8192 | 8192 | 0 | EV 0.725 / L0 203 |
pythia-160m/W_U/architecture-comparison/d8192/gated/seed0.safetensors |
EleutherAI/pythia-160m | W_U | architecture-comparison/d8192 | 8192 | 0 | EV 0.214 / L0 12 |
pythia-160m/W_U/architecture-comparison/d8192/gated-retuned/seed0.safetensors |
EleutherAI/pythia-160m | W_U | architecture-comparison/d8192 | 8192 | 0 | EV 0.827 / L0 654 |
pythia-160m/W_U/cross-snapshot-16/d8192/seed0.safetensors |
EleutherAI/pythia-160m | W_U | cross-snapshot-16 | 8192 | 0 | EV 0.773 / L0 216 |
pythia-160m/W_U/cross-snapshot-32/d16384/seed0.safetensors |
EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 16384 | 0 | EV 0.780 / L0 103 |
pythia-160m/W_U/cross-snapshot-32/d24576/seed0.safetensors |
EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 24576 | 0 | EV 0.920 / L0 286 |
pythia-160m/W_U/cross-snapshot-32/d24576/seed1.safetensors |
EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 24576 | 1 | EV 0.920 / L0 286 |
pythia-160m/W_U/cross-snapshot-32/d24576/seed2.safetensors |
EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 24576 | 2 | EV 0.920 / L0 286 |
pythia-160m/W_U/cross-snapshot-32/d8192/seed0.safetensors |
EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 8192 | 0 | EV 0.776 / L0 203 |
pythia-160m/W_U/cross-snapshot-32/d8192/seed1.safetensors |
EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 8192 | 1 | EV 0.776 / L0 203 |
pythia-160m/W_U/cross-snapshot-32/d8192/seed2.safetensors |
EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 8192 | 2 | EV 0.776 / L0 203 |
pythia-160m/W_U/cross-snapshot-32/d8192/seed3.safetensors |
EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 8192 | 3 | EV 0.776 / L0 203 |
pythia-160m/W_U/cross-snapshot-32/d8192/seed4.safetensors |
EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 8192 | 4 | EV 0.777 / L0 203 |
pythia-160m/W_U/final-snapshot-saes/d16384.safetensors |
EleutherAI/pythia-160m | W_U | final-snapshot-saes | 16384 | 0 | EV 0.870 / L0 1913 |
pythia-160m/W_U/final-snapshot-saes/d32768.safetensors |
EleutherAI/pythia-160m | W_U | final-snapshot-saes | 32768 | 0 | EV 0.926 / L0 3410 |
pythia-160m/W_U/final-snapshot-saes/d6144.safetensors |
EleutherAI/pythia-160m | W_U | final-snapshot-saes | 6144 | 0 | EV 0.765 / L0 862 |
pythia-160m/W_U/final-snapshot-saes/d65536.safetensors |
EleutherAI/pythia-160m | W_U | final-snapshot-saes | 65536 | 0 | EV 0.964 / L0 5943 |
pythia-160m/W_U/final-snapshot-saes/d8192.safetensors |
EleutherAI/pythia-160m | W_U | final-snapshot-saes | 8192 | 0 | EV 0.799 / L0 1084 |
pythia-160m/W_U/lambda-sweep/d8192/lam0p40_seed0.safetensors |
EleutherAI/pythia-160m | W_U | lambda-sweep | 8192 | 0 | EV 0.748 / L0 160 |
pythia-160m/W_U/lambda-sweep/d8192/lam1p00_seed0.safetensors |
EleutherAI/pythia-160m | W_U | lambda-sweep | 8192 | 0 | EV 0.632 / L0 58 |
pythia-160m/W_U/lambda-sweep/d8192/lam1p20_seed0.safetensors |
EleutherAI/pythia-160m | W_U | lambda-sweep | 8192 | 0 | EV 0.603 / L0 45 |
pythia-160m/W_U/lambda-sweep/d8192/lam1p35_seed0.safetensors |
EleutherAI/pythia-160m | W_U | lambda-sweep | 8192 | 0 | EV 0.582 / L0 38 |
pythia-160m/W_U/lambda-sweep/d8192/lam1p35_seed1.safetensors |
EleutherAI/pythia-160m | W_U | lambda-sweep | 8192 | 1 | EV 0.582 / L0 38 |
pythia-160m/W_U/lambda-sweep/d8192/lam1p35_seed2.safetensors |
EleutherAI/pythia-160m | W_U | lambda-sweep | 8192 | 2 | EV 0.582 / L0 38 |
pythia-160m/W_U/lambda-sweep/d8192/lam1p80_seed0.safetensors |
EleutherAI/pythia-160m | W_U | lambda-sweep | 8192 | 0 | EV 0.528 / L0 23 |
pythia-160m/W_U/per-snapshot-saes/d8192/step0.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 |
pythia-160m/W_U/per-snapshot-saes/d8192/step1.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 |
pythia-160m/W_U/per-snapshot-saes/d8192/step1000.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.786 / L0 997 |
pythia-160m/W_U/per-snapshot-saes/d8192/step102000.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.800 / L0 983 |
pythia-160m/W_U/per-snapshot-saes/d8192/step116000.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.812 / L0 958 |
pythia-160m/W_U/per-snapshot-saes/d8192/step128.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 |
pythia-160m/W_U/per-snapshot-saes/d8192/step130000.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.820 / L0 940 |
pythia-160m/W_U/per-snapshot-saes/d8192/step14000.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.791 / L0 996 |
pythia-160m/W_U/per-snapshot-saes/d8192/step143000.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.824 / L0 924 |
pythia-160m/W_U/per-snapshot-saes/d8192/step16.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 |
pythia-160m/W_U/per-snapshot-saes/d8192/step2.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 |
pythia-160m/W_U/per-snapshot-saes/d8192/step2000.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.799 / L0 969 |
pythia-160m/W_U/per-snapshot-saes/d8192/step21000.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.791 / L0 998 |
pythia-160m/W_U/per-snapshot-saes/d8192/step256.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.732 / L0 1142 |
pythia-160m/W_U/per-snapshot-saes/d8192/step27000.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.791 / L0 999 |
pythia-160m/W_U/per-snapshot-saes/d8192/step3000.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.799 / L0 972 |
pythia-160m/W_U/per-snapshot-saes/d8192/step32.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 |
pythia-160m/W_U/per-snapshot-saes/d8192/step34000.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.791 / L0 1000 |
pythia-160m/W_U/per-snapshot-saes/d8192/step4.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 |
pythia-160m/W_U/per-snapshot-saes/d8192/step4000.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.798 / L0 977 |
pythia-160m/W_U/per-snapshot-saes/d8192/step47000.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.791 / L0 1000 |
pythia-160m/W_U/per-snapshot-saes/d8192/step5000.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.796 / L0 982 |
pythia-160m/W_U/per-snapshot-saes/d8192/step512.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.754 / L0 1087 |
pythia-160m/W_U/per-snapshot-saes/d8192/step6000.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.795 / L0 985 |
pythia-160m/W_U/per-snapshot-saes/d8192/step61000.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.790 / L0 1002 |
pythia-160m/W_U/per-snapshot-saes/d8192/step64.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 |
pythia-160m/W_U/per-snapshot-saes/d8192/step7000.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.794 / L0 988 |
pythia-160m/W_U/per-snapshot-saes/d8192/step75000.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.790 / L0 1004 |
pythia-160m/W_U/per-snapshot-saes/d8192/step8.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 |
pythia-160m/W_U/per-snapshot-saes/d8192/step8000.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.793 / L0 990 |
pythia-160m/W_U/per-snapshot-saes/d8192/step89000.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.792 / L0 1002 |
pythia-160m/W_U/per-snapshot-saes/d8192/step9000.safetensors |
EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.792 / L0 992 |
pythia-1b/W_U/cross-snapshot-32/d16384/seed0.safetensors |
EleutherAI/pythia-1b | W_U | cross-snapshot-32 | 16384 | 0 | EV 0.781 / L0 499 |
pythia-1b/W_U/cross-snapshot-32/d24576/seed0.safetensors |
EleutherAI/pythia-1b | W_U | cross-snapshot-32 | 24576 | 0 | EV 0.861 / L0 517 |
pythia-1b/W_U/cross-snapshot-32/d8192/seed0.safetensors |
EleutherAI/pythia-1b | W_U | cross-snapshot-32 | 8192 | 0 | EV 0.628 / L0 374 |
pythia-1b/W_U/cross-snapshot-32-matched-window/d24576/seed0.safetensors |
EleutherAI/pythia-1b | W_U | cross-snapshot-32-matched-window | 24576 | 0 | EV 0.884 / L0 264 |
pythia-6.9b/W_U/cross-snapshot-32/d32768/seed0-sparse.safetensors |
EleutherAI/pythia-6.9b | W_U | cross-snapshot-32 | 32768 | 0 | EV 0.808 / L0 742 |
pythia-6.9b/W_U/cross-snapshot-32/d32768/seed0.safetensors |
EleutherAI/pythia-6.9b | W_U | cross-snapshot-32 | 32768 | 0 | EV 0.833 / L0 1957 |
Quality metrics are recomputed from the released weights on the released
snapshot schedule (see the code repo's scripts/eval/recompute_metrics.py).
The gated architecture-comparison run intentionally documents
位-transfer failure (default 位=0.3 moved across architectures); see
gated-retuned (位=0.05) for the tuned comparison point.
Citation
@misc{he2026learningtoreadout,
title = {Learning to Read Out: Unembedding Dynamics in Language Model Pretraining},
author = {He, Matteo and Shen, William F. and Iacob, Alex and Jovanovic, Andrej
and Qiu, Xinchi and Lane, Nicholas D.},
year = {2026},
note = {Under review. Code: https://github.com/hematteo/learning-to-read-out},
}
MIT. W_U/W_E source tensors derive from public Apache-2.0 checkpoints (EleutherAI Pythia, AllenAI OLMo-2). The eval corpus derives from Wikipedia (CC-BY-SA 4.0).