Abstract
Image representations are often evaluated through disjointed, task-specific protocols, leading to a fragmented understanding of model capabilities. For instance, it is unclear whether an image embedding model adept at clustering images is equally good at retrieving relevant images given a piece of text. We introduce the Massive Image Embedding Benchmark (MIEB) to evaluate the performance of image and image-text embedding models across the broadest spectrum to date. MIEB spans 38 languages across 130 individual tasks, which we group into 8 high-level categories. We benchmark 50 models across our benchmark, finding that no single method dominates across all task categories. We reveal hidden capabilities in advanced vision models such as their accurate visual representation of texts, and their yet limited capabilities in interleaved encodings and matching images and texts in the presence of confounders. We also show that the performance of vision encoders on MIEB correlates highly with their performance when used in multimodal large language models. Our code, dataset, and leaderboard are publicly available at https://github.com/embeddings-benchmark/mteb.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ABC: Achieving Better Control of Multimodal Embeddings using VLMs (2025)
- FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations (2025)
- TULIP: Towards Unified Language-Image Pretraining (2025)
- IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval (2025)
- BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries (2025)
- Gemini Embedding: Generalizable Embeddings from Gemini (2025)
- VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper