Sparse Autoencoders for Phi-3-mini-4k-instruct

These are trained Sparse Autoencoders (SAEs) for the microsoft/Phi-3-mini-4k-instruct model, compatible with the sae_lens library.

Repository Structure

  • layer_{L}/ - Contains the final SAELens-compatible SAFETENSORS for each target layer.
  • checkpoints/ - Contains the intermediate PyTorch .pt step checkpoints saved during training.

Model Details

  • Model: microsoft/Phi-3-mini-4k-instruct
  • Layers: [21, 22, 23]
  • Architecture: Standard (ReLU) with pre-encoder centring (b_dec applied to input).
  • Expansion Factor: 16x (49152 features)
  • Tokens Trained: ~1034M

Datasets

  • NuminaMath-CoT: 40%
  • HH-RLHF: 20%
  • FineWeb: 20%
  • JBB/HarmBench/AdvBench: 20%

Usage with SAELens

from sae_lens import SAE

# Load the final SAE for a specific layer, e.g., layer 21
sae, cfg_dict, sparsity = SAE.from_pretrained(
    release="analist/SAE-Phi-3-mini-4k-instruct",
    sae_id="layer_21",
    device="cuda"
)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support