Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
Abstract
Sparse Autoencoders (SAEs) have recently been shown to enhance interpretability and steerability in Large Language Models (LLMs). In this work, we extend the application of SAEs to Vision-Language Models (VLMs), such as CLIP, and introduce a comprehensive framework for evaluating monosemanticity in vision representations. Our experimental results reveal that SAEs trained on VLMs significantly enhance the monosemanticity of individual neurons while also exhibiting hierarchical representations that align well with expert-defined structures (e.g., iNaturalist taxonomy). Most notably, we demonstrate that applying SAEs to intervene on a CLIP vision encoder, directly steer output from multimodal LLMs (e.g., LLaVA) without any modifications to the underlying model. These findings emphasize the practicality and efficacy of SAEs as an unsupervised approach for enhancing both the interpretability and control of VLMs.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Interpreting CLIP with Hierarchical Sparse Autoencoders (2025)
- Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment (2025)
- A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models (2025)
- Learning Multi-Level Features with Matryoshka Sparse Autoencoders (2025)
- Route Sparse Autoencoder to Interpret Large Language Models (2025)
- Steered Generation via Gradient Descent on Sparse Features (2025)
- Tokenized SAEs: Disentangling SAE Reconstructions (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper