Qwen2.5-1.5B-Instruct Sparse Autoencoders

Sparse Autoencoders trained on the residual stream of Qwen/Qwen2.5-1.5B-Instruct for layers 17–21 except 20.

Training Details

Architecture: Standard SAE (L1 penalty)
Hook: model.layers.{layer} (residual stream post-layer)
d_model: 1536
Expansion Factor: 16× → 24,576 features
Training Tokens: 100M
Dataset: Mixed reasoning (NuminaMath-CoT) + safety pairs (Anthropic HH-RLHF) + jailbreak prompts (JailbreakBench) + general knowledge (FineWeb-Edu)

Repository Structure

layer_17/final_sae/ ← SAE weights + config for layer 17 ... layer_21/final_sae/

Usage

from sae_lens import SAE

sae, cfg_dict, sparsity = SAE.from_pretrained(
    "HuggingAnalist/sae-qwen2.5-1.5B-res",
    subfolder="layer_17/final_sae",
)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HuggingAnalist/sae-qwen2.5-1.5B-res

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Finetuned

(1693)

this model