Gemma4 Audio SAE Atlas
This repository contains sparse autoencoder (SAE) artifacts trained over audio-token hidden states from google/gemma-4-12B-it for audio interpretability research.
Current release status
| Layer | Status | Checkpoint | Atlas labels | Evaluation |
|---|---|---|---|---|
| 24 | Complete local/ORD artifact | layers/l24/sae_layer_24.pt when packaged from ORD |
GPT-5.5 monosemantic top/bottom atlas labels | held-out reconstruction + frozen clinical-transfer overlay |
| 48 | In progress | pending | pending | pending |
| 8 | planned | pending | pending | pending |
What is included
- SAE checkpoints, if packaged from the machine that has them.
- Training/evaluation summaries and metrics.
- Representative metadata and sparse score artifacts where available.
- GPT-labeled monosemantic atlas tables for selected features.
- Frozen clinical-transfer overlay tables.
- Paper-candidate evidence packets.
No raw audio is included in this model repository.
Intended use
This release is for research on mechanistic interpretability of audio-language models: feature browsing, cross-layer comparison, feature-family analysis, retrieval/explanation experiments, and non-diagnostic clinical-audio transfer analysis.
Non-diagnostic clinical use limitation
The clinical-transfer overlays identify acoustic, speech, dialogue, or dataset/task correlates. They are not diagnostic medical models, and labels should not be interpreted as disease labels. Public respiratory datasets include prompted counting/task-recitation artifacts; those are reported separately from cough/breath features.
Training summary for L24
- Base model:
google/gemma-4-12B-it - Hidden-state site: layer 24 audio-token residual states
- Hidden size: 3840
- SAE dictionary size: 30,720
- Expansion factor: 8
- TopK: 128
- Steps: 100,000
- Batch size: 4,096
- Train tokens: 8,825,250
- Held-out test tokens: 2,102,500
- Held-out test FVE: 0.9173
- Held-out normalized L2: 0.0586
See layers/l24/summary.json, layers/l24/train_metrics.jsonl, layers/l24/validation_metrics.jsonl, and layers/l24/test_eval/summary.json for reproducible details.
Citation
Paper/preprint pending. If you use this artifact before the paper is available, cite the repository and version/commit hash.