Qwen2.5-1.5B-Instruct Sparse Autoencoders
Sparse Autoencoders trained on the residual stream of Qwen/Qwen2.5-1.5B-Instruct for layers 17โ21 except 20.
Training Details
- Architecture: Standard SAE (L1 penalty)
- Hook:
model.layers.{layer}(residual stream post-layer) - d_model: 1536
- Expansion Factor: 16ร โ 24,576 features
- Training Tokens: 100M
- Dataset: Mixed reasoning (NuminaMath-CoT) + safety pairs (Anthropic HH-RLHF) + jailbreak prompts (JailbreakBench) + general knowledge (FineWeb-Edu)
Repository Structure
layer_17/final_sae/ โ SAE weights + config for layer 17
...
layer_21/final_sae/
Usage
from sae_lens import SAE
sae, cfg_dict, sparsity = SAE.from_pretrained(
"HuggingAnalist/sae-qwen2.5-1.5B-res",
subfolder="layer_17/final_sae",
)
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support