SAEs and transcoders can be loaded using https://github.com/EleutherAI/sae.

These transcoders were trained on the outputs of the first 15 MLPs in deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. We used 10 billion tokens from FineWeb edu deduped at a context length of 2048. The number of latents is 65,536 and a linear skip connection is included.

Fraction of variance unexplained ranges from 0.01 to 0.37.

Downloads last month
15
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for EleutherAI/skip-transcoder-DeepSeek-R1-Distill-Qwen-1.5B-65k

Finetuned
(43)
this model

Dataset used to train EleutherAI/skip-transcoder-DeepSeek-R1-Distill-Qwen-1.5B-65k

Collection including EleutherAI/skip-transcoder-DeepSeek-R1-Distill-Qwen-1.5B-65k