Parameter-trajectory crosscoders for vocabulary readout evolution

Trained-dictionary release for Learning to Read Out: Unembedding Dynamics in Language Model Pretraining. We train snapshot crosscoders on parameter tensors (rather than activations) sampled across pretraining checkpoints. In the output unembedding $W_U$ this reveals how a sparse vocabulary readout forms, reorganizes, and becomes load-bearing during pretraining.

Code, figure-by-figure reproduction map, and retraining recipes: https://github.com/hematteo/learning-to-read-out (see docs/REPRODUCE.md and docs/DATA.md; per-run settings of record in configs/runs/).

Quick start

# everything (~180 GB)
hf download matteohe/parameter-trajectory-crosscoders --local-dir $UM_SSD_ROOT/hf_release/parameter-trajectory-crosscoders
# one model only
hf download matteohe/parameter-trajectory-crosscoders --include "pythia-1b/**" --local-dir ...

Each artifact is <name>.safetensors + <name>.config.json (training hyperparameters and recomputed quality metrics) + <name>.md (card). index.json is the machine-readable inventory of everything below.

What you probably want

What Path
Headline 5-seed Pythia-160M $W_U$ crosscoder pythia-160m/W_U/cross-snapshot-32/d8192/seed{0..4}.safetensors
High-resolution 160M instrument (atlas) pythia-160m/W_U/cross-snapshot-32/d24576/seed0.safetensors
Cross-scale (Pythia-1B) pythia-1b/W_U/cross-snapshot-32/d24576/seed0.safetensors
Large-scale, selected sparse run pythia-6.9b/W_U/cross-snapshot-32/d32768/seed0-sparse.safetensors
Cross-family (OLMo-2-7B) olmo-2-7b/W_U/cross-snapshot-32/d32768/seed0.safetensors
Read/write asymmetry ($W_E$ side) pythia-160m/W_E/cross-snapshot-32/...
Activation-rate aggregates (lifecycle figures) derived/aggregates/, derived/rates/
Attribution-patching artifacts attribution/pythia-160m/
Held-out eval token corpus evaluation/eval-corpus/eval_tokens.pt

Full inventory

Path Model Matrix Kind d_sae Seed Quality
olmo-2-7b/W_U/cross-snapshot-32/d32768/seed0.safetensors allenai/OLMo-2-1124-7B W_U cross-snapshot-32 32768 0 EV 0.853 / L0 557
pythia-160m/W_E/cross-snapshot-32/d24576/seed0.safetensors EleutherAI/pythia-160m W_E cross-snapshot-32 24576 0 EV 0.831 / L0 118
pythia-160m/W_E/cross-snapshot-32/d8192/seed0.safetensors EleutherAI/pythia-160m W_E cross-snapshot-32 8192 0 EV 0.581 / L0 82
pythia-160m/W_E/cross-snapshot-32/d8192/seed1.safetensors EleutherAI/pythia-160m W_E cross-snapshot-32 8192 1 EV 0.580 / L0 82
pythia-160m/W_E/cross-snapshot-32/d8192/seed2.safetensors EleutherAI/pythia-160m W_E cross-snapshot-32 8192 2 EV 0.582 / L0 82
pythia-160m/W_E/cross-snapshot-32/d8192/seed3.safetensors EleutherAI/pythia-160m W_E cross-snapshot-32 8192 3 EV 0.581 / L0 82
pythia-160m/W_E/cross-snapshot-32/d8192/seed4.safetensors EleutherAI/pythia-160m W_E cross-snapshot-32 8192 4 EV 0.583 / L0 83
pythia-160m/W_U/architecture-comparison/d8192/batchtopk/seed0.safetensors EleutherAI/pythia-160m W_U architecture-comparison/d8192 8192 0 EV 0.725 / L0 203
pythia-160m/W_U/architecture-comparison/d8192/gated/seed0.safetensors EleutherAI/pythia-160m W_U architecture-comparison/d8192 8192 0 EV 0.214 / L0 12
pythia-160m/W_U/architecture-comparison/d8192/gated-retuned/seed0.safetensors EleutherAI/pythia-160m W_U architecture-comparison/d8192 8192 0 EV 0.827 / L0 654
pythia-160m/W_U/cross-snapshot-16/d8192/seed0.safetensors EleutherAI/pythia-160m W_U cross-snapshot-16 8192 0 EV 0.773 / L0 216
pythia-160m/W_U/cross-snapshot-32/d16384/seed0.safetensors EleutherAI/pythia-160m W_U cross-snapshot-32 16384 0 EV 0.780 / L0 103
pythia-160m/W_U/cross-snapshot-32/d24576/seed0.safetensors EleutherAI/pythia-160m W_U cross-snapshot-32 24576 0 EV 0.920 / L0 286
pythia-160m/W_U/cross-snapshot-32/d24576/seed1.safetensors EleutherAI/pythia-160m W_U cross-snapshot-32 24576 1 EV 0.920 / L0 286
pythia-160m/W_U/cross-snapshot-32/d24576/seed2.safetensors EleutherAI/pythia-160m W_U cross-snapshot-32 24576 2 EV 0.920 / L0 286
pythia-160m/W_U/cross-snapshot-32/d8192/seed0.safetensors EleutherAI/pythia-160m W_U cross-snapshot-32 8192 0 EV 0.776 / L0 203
pythia-160m/W_U/cross-snapshot-32/d8192/seed1.safetensors EleutherAI/pythia-160m W_U cross-snapshot-32 8192 1 EV 0.776 / L0 203
pythia-160m/W_U/cross-snapshot-32/d8192/seed2.safetensors EleutherAI/pythia-160m W_U cross-snapshot-32 8192 2 EV 0.776 / L0 203
pythia-160m/W_U/cross-snapshot-32/d8192/seed3.safetensors EleutherAI/pythia-160m W_U cross-snapshot-32 8192 3 EV 0.776 / L0 203
pythia-160m/W_U/cross-snapshot-32/d8192/seed4.safetensors EleutherAI/pythia-160m W_U cross-snapshot-32 8192 4 EV 0.777 / L0 203
pythia-160m/W_U/final-snapshot-saes/d16384.safetensors EleutherAI/pythia-160m W_U final-snapshot-saes 16384 0 EV 0.870 / L0 1913
pythia-160m/W_U/final-snapshot-saes/d32768.safetensors EleutherAI/pythia-160m W_U final-snapshot-saes 32768 0 EV 0.926 / L0 3410
pythia-160m/W_U/final-snapshot-saes/d6144.safetensors EleutherAI/pythia-160m W_U final-snapshot-saes 6144 0 EV 0.765 / L0 862
pythia-160m/W_U/final-snapshot-saes/d65536.safetensors EleutherAI/pythia-160m W_U final-snapshot-saes 65536 0 EV 0.964 / L0 5943
pythia-160m/W_U/final-snapshot-saes/d8192.safetensors EleutherAI/pythia-160m W_U final-snapshot-saes 8192 0 EV 0.799 / L0 1084
pythia-160m/W_U/lambda-sweep/d8192/lam0p40_seed0.safetensors EleutherAI/pythia-160m W_U lambda-sweep 8192 0 EV 0.748 / L0 160
pythia-160m/W_U/lambda-sweep/d8192/lam1p00_seed0.safetensors EleutherAI/pythia-160m W_U lambda-sweep 8192 0 EV 0.632 / L0 58
pythia-160m/W_U/lambda-sweep/d8192/lam1p20_seed0.safetensors EleutherAI/pythia-160m W_U lambda-sweep 8192 0 EV 0.603 / L0 45
pythia-160m/W_U/lambda-sweep/d8192/lam1p35_seed0.safetensors EleutherAI/pythia-160m W_U lambda-sweep 8192 0 EV 0.582 / L0 38
pythia-160m/W_U/lambda-sweep/d8192/lam1p35_seed1.safetensors EleutherAI/pythia-160m W_U lambda-sweep 8192 1 EV 0.582 / L0 38
pythia-160m/W_U/lambda-sweep/d8192/lam1p35_seed2.safetensors EleutherAI/pythia-160m W_U lambda-sweep 8192 2 EV 0.582 / L0 38
pythia-160m/W_U/lambda-sweep/d8192/lam1p80_seed0.safetensors EleutherAI/pythia-160m W_U lambda-sweep 8192 0 EV 0.528 / L0 23
pythia-160m/W_U/per-snapshot-saes/d8192/step0.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.729 / L0 1150
pythia-160m/W_U/per-snapshot-saes/d8192/step1.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.729 / L0 1150
pythia-160m/W_U/per-snapshot-saes/d8192/step1000.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.786 / L0 997
pythia-160m/W_U/per-snapshot-saes/d8192/step102000.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.800 / L0 983
pythia-160m/W_U/per-snapshot-saes/d8192/step116000.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.812 / L0 958
pythia-160m/W_U/per-snapshot-saes/d8192/step128.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.729 / L0 1150
pythia-160m/W_U/per-snapshot-saes/d8192/step130000.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.820 / L0 940
pythia-160m/W_U/per-snapshot-saes/d8192/step14000.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.791 / L0 996
pythia-160m/W_U/per-snapshot-saes/d8192/step143000.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.824 / L0 924
pythia-160m/W_U/per-snapshot-saes/d8192/step16.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.729 / L0 1150
pythia-160m/W_U/per-snapshot-saes/d8192/step2.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.729 / L0 1150
pythia-160m/W_U/per-snapshot-saes/d8192/step2000.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.799 / L0 969
pythia-160m/W_U/per-snapshot-saes/d8192/step21000.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.791 / L0 998
pythia-160m/W_U/per-snapshot-saes/d8192/step256.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.732 / L0 1142
pythia-160m/W_U/per-snapshot-saes/d8192/step27000.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.791 / L0 999
pythia-160m/W_U/per-snapshot-saes/d8192/step3000.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.799 / L0 972
pythia-160m/W_U/per-snapshot-saes/d8192/step32.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.729 / L0 1150
pythia-160m/W_U/per-snapshot-saes/d8192/step34000.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.791 / L0 1000
pythia-160m/W_U/per-snapshot-saes/d8192/step4.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.729 / L0 1150
pythia-160m/W_U/per-snapshot-saes/d8192/step4000.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.798 / L0 977
pythia-160m/W_U/per-snapshot-saes/d8192/step47000.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.791 / L0 1000
pythia-160m/W_U/per-snapshot-saes/d8192/step5000.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.796 / L0 982
pythia-160m/W_U/per-snapshot-saes/d8192/step512.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.754 / L0 1087
pythia-160m/W_U/per-snapshot-saes/d8192/step6000.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.795 / L0 985
pythia-160m/W_U/per-snapshot-saes/d8192/step61000.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.790 / L0 1002
pythia-160m/W_U/per-snapshot-saes/d8192/step64.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.729 / L0 1150
pythia-160m/W_U/per-snapshot-saes/d8192/step7000.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.794 / L0 988
pythia-160m/W_U/per-snapshot-saes/d8192/step75000.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.790 / L0 1004
pythia-160m/W_U/per-snapshot-saes/d8192/step8.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.729 / L0 1150
pythia-160m/W_U/per-snapshot-saes/d8192/step8000.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.793 / L0 990
pythia-160m/W_U/per-snapshot-saes/d8192/step89000.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.792 / L0 1002
pythia-160m/W_U/per-snapshot-saes/d8192/step9000.safetensors EleutherAI/pythia-160m W_U per-snapshot-saes 8192 0 EV 0.792 / L0 992
pythia-1b/W_U/cross-snapshot-32/d16384/seed0.safetensors EleutherAI/pythia-1b W_U cross-snapshot-32 16384 0 EV 0.781 / L0 499
pythia-1b/W_U/cross-snapshot-32/d24576/seed0.safetensors EleutherAI/pythia-1b W_U cross-snapshot-32 24576 0 EV 0.861 / L0 517
pythia-1b/W_U/cross-snapshot-32/d8192/seed0.safetensors EleutherAI/pythia-1b W_U cross-snapshot-32 8192 0 EV 0.628 / L0 374
pythia-1b/W_U/cross-snapshot-32-matched-window/d24576/seed0.safetensors EleutherAI/pythia-1b W_U cross-snapshot-32-matched-window 24576 0 EV 0.884 / L0 264
pythia-6.9b/W_U/cross-snapshot-32/d32768/seed0-sparse.safetensors EleutherAI/pythia-6.9b W_U cross-snapshot-32 32768 0 EV 0.808 / L0 742
pythia-6.9b/W_U/cross-snapshot-32/d32768/seed0.safetensors EleutherAI/pythia-6.9b W_U cross-snapshot-32 32768 0 EV 0.833 / L0 1957

Quality metrics are recomputed from the released weights on the released snapshot schedule (see the code repo's scripts/eval/recompute_metrics.py). The gated architecture-comparison run intentionally documents 位-transfer failure (default 位=0.3 moved across architectures); see gated-retuned (位=0.05) for the tuned comparison point.

Citation

@misc{he2026learningtoreadout,
  title  = {Learning to Read Out: Unembedding Dynamics in Language Model Pretraining},
  author = {He, Matteo and Shen, William F. and Iacob, Alex and Jovanovic, Andrej
            and Qiu, Xinchi and Lane, Nicholas D.},
  year   = {2026},
  note   = {Under review. Code: https://github.com/hematteo/learning-to-read-out},
}

MIT. W_U/W_E source tensors derive from public Apache-2.0 checkpoints (EleutherAI Pythia, AllenAI OLMo-2). The eval corpus derives from Wikipedia (CC-BY-SA 4.0).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support