Qwen2.5-0.5B-Instruct

Sparse Autoencoders trained on the residual stream of Qwen/Qwen2.5-0.5B-Instruct for layers 16–22.

Training details

Architecture: Standard SAE (L1 penalty)
Hook: model.layers.{layer} (residual stream post-layer)
d_model: 896
Expansion: 16× → 14 336 features
Training tokens: 100M
Dataset: Mixed reasoning (NuminaMath-CoT) + safety pairs (Anthropic HH-RLHF) + jailbreak prompts (JailbreakBench) + general knowledge (FineWeb-Edu)

Repository structure

layer_16/final_sae/ ← SAE weights + config for layer 16 layer_17/final_sae/ ... layer_22/final_sae/

Usage

from sae_lens import SAE

sae, cfg_dict, sparsity = SAE.load_from_pretrained(
    "HuggingAnalist/sae-qwen2.5-0.5B-res",
    subfolder="layer_20/final_sae",
)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HuggingAnalist/sae-qwen2.5-0.5B-res

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Finetuned

(878)

this model