Qwen2.5-1.5B-Instruct Sparse Autoencoders

Sparse Autoencoders trained on the residual stream of Qwen/Qwen2.5-1.5B-Instruct for layers 17โ€“21 except 20.

Training Details

  • Architecture: Standard SAE (L1 penalty)
  • Hook: model.layers.{layer} (residual stream post-layer)
  • d_model: 1536
  • Expansion Factor: 16ร— โ†’ 24,576 features
  • Training Tokens: 100M
  • Dataset: Mixed reasoning (NuminaMath-CoT) + safety pairs (Anthropic HH-RLHF) + jailbreak prompts (JailbreakBench) + general knowledge (FineWeb-Edu)

Repository Structure

layer_17/final_sae/ โ† SAE weights + config for layer 17 ... layer_21/final_sae/

Usage

from sae_lens import SAE

sae, cfg_dict, sparsity = SAE.from_pretrained(
    "HuggingAnalist/sae-qwen2.5-1.5B-res",
    subfolder="layer_17/final_sae",
)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for HuggingAnalist/sae-qwen2.5-1.5B-res

Finetuned
(1693)
this model