Qwen2.5-0.5B-Instruct
Sparse Autoencoders trained on the residual stream of Qwen/Qwen2.5-0.5B-Instruct for layers 16โ22.
Training details
- Architecture: Standard SAE (L1 penalty)
- Hook:
model.layers.{layer}(residual stream post-layer) - d_model: 896
- Expansion: 16ร โ 14 336 features
- Training tokens: 100M
- Dataset: Mixed reasoning (NuminaMath-CoT) + safety pairs (Anthropic HH-RLHF) + jailbreak prompts (JailbreakBench) + general knowledge (FineWeb-Edu)
Repository structure
layer_16/final_sae/ โ SAE weights + config for layer 16 layer_17/final_sae/ ... layer_22/final_sae/
Usage
from sae_lens import SAE
sae, cfg_dict, sparsity = SAE.load_from_pretrained(
"HuggingAnalist/sae-qwen2.5-0.5B-res",
subfolder="layer_20/final_sae",
)
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support