Can sparse autoencoders be used to decompose and interpret steering vectors? Paper • 2411.08790 • Published 16 days ago • 8
Ablation is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction Paper • 2411.06424 • Published 19 days ago • 5