Abstract
The field of psychology has long recognized a basic level of categorization that humans use when labeling visual stimuli, a term coined by Rosch in 1976. This level of categorization has been found to be used most frequently, to have higher information density, and to aid in visual language tasks with priming in humans. Here, we investigate basic level categorization in two recently released, open-source vision-language models (VLMs). This paper demonstrates that Llama 3.2 Vision Instruct (11B) and Molmo 7B-D both prefer basic level categorization consistent with human behavior. Moreover, the models' preferences are consistent with nuanced human behaviors like the biological versus non-biological basic level effects and the well established expert basic level shift, further suggesting that VLMs acquire cognitive categorization behaviors from the human data on which they are trained.
Community
Do large vision models categorize things at the basic level as we see humans do? We find that VLLMs show basic level preference, shift away from the basic level when influenced by expert prompting, and have a biological/non-biological interference pattern that all follow the well established results in the cognitive science literature.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- The in-context inductive biases of vision-language models differ across modalities (2025)
- Testing the limits of fine-tuning to improve reasoning in vision language models (2025)
- Analyze the Neurons, not the Embeddings: Understanding When and Where LLM Representations Align with Humans (2025)
- Human-like conceptual representations emerge from language prediction (2025)
- How Deep is Love in LLMs' Hearts? Exploring Semantic Size in Human-like Cognition (2025)
- Language Models Largely Exhibit Human-like Constituent Ordering Preferences (2025)
- Beyond Pattern Recognition: Probing Mental Representations of LMs (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper