YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Attention Output SAEs Paper
This is the repository for Attention Output SAEs for GeLU-2L, GPT-2 Small and Gemma-2B. More details are available in the accompanying paper.
Important details of SAE training include:
- SAE Widths. Our GELU-2L and Gemma-2B SAEs have width 16384. All of our GPT-2 Small SAEs have width 24576, with the exception of layers 5 and 7, which have width 49152.
- Loss Function. We trained our Gemma-2B SAE with a different loss function than the SAEs from other models. For Gemma-2B we closely follow the approach from Olah et al., while for GELU-2L and GPT-2 Small, we closely follow the approach from Bricken et al.
- Training Data. We use activations from hundreds of millions to billions of activations from LM forward passes as input data to the SAE. Following Nanda, we use a shuffled buffer of these activations, so that optimization steps don’t use data from highly correlated activations. For GELU-2L we use a mixture of 80% C4 webtext and 20% code (https://huggingface.co/datasets/NeelNanda/c4-code-tokenized-2b). For GPT-2 Small we use OpenWebText (https://huggingface.co/datasets/Skylion007/openwebtext). For Gemma-2B we use https://huggingface.co/datasets/HuggingFaceFW/fineweb.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support