Sparse Autoencoders Find Highly Interpretable Features in Language Models
Paper
• 2309.08600 • Published
• 13
A collection of papers that I found useful for learning about using Sparse Autoencoders for finding interpretable features in language models