jbloom
/

Gemma-2b-Residual-Stream-SAEs

Model card Files Files and versions Community

jbloom commited on May 21

Commit

edc8777

•

1 Parent(s): c50b27f

Update README.md

Files changed (1) hide show

README.md +13 -0

README.md CHANGED Viewed

@@ -53,4 +53,17 @@ Notes:
   - Excepting activation normalization.
   - We increased the learning rate here by one order of magnitude in order to explore whether this resulted in faster training (in particular, a lower L0 more quickly)
     - We find in practice that the drop in L0 is accelerated but this results is significantly more dead features (likely causing worse reconstruction)
 - As above, it is likely under-trained.

   - Excepting activation normalization.
   - We increased the learning rate here by one order of magnitude in order to explore whether this resulted in faster training (in particular, a lower L0 more quickly)
     - We find in practice that the drop in L0 is accelerated but this results is significantly more dead features (likely causing worse reconstruction)
+- As above, it is likely under-trained.
+## Resid Post 12
+Stats:
+- 16384 Features (expansion factor 8) achieving a CE Loss score of
+- CE Loss score of 95.99% (2.563 without SAE, 2.96 with the SAE)
+- Mean L0 52 (in practice L0 is log normal distributed and is heavily right tailed).
+- Dead Features: Less than 200 dead features.
+Notes:
+- This SAE was trained with methods from the Anthropic [April Update](https://transformer-circuits.pub/2024/april-update/index.html#training-saes)
+  - **With activation normalization**. This means that activations should be multiplied by a constant such that E(|X|) = sqrt(2048)
 - As above, it is likely under-trained.