arxiv:2503.10357

Do I look like a `cat.n.01` to you? A Taxonomy Image Generation Benchmark

Published on Mar 13

· Submitted by

Authors:

Viktor Moskvoretskii ,

Alexander Panchenko ,

Irina Nikishina

Abstract

This paper explores the feasibility of using text-to-image models in a zero-shot setup to generate images for taxonomy concepts. While text-based methods for taxonomy enrichment are well-established, the potential of the visual dimension remains unexplored. To address this, we propose a comprehensive benchmark for Taxonomy Image Generation that assesses models' abilities to understand taxonomy concepts and generate relevant, high-quality images. The benchmark includes common-sense and randomly sampled WordNet concepts, alongside the LLM generated predictions. The 12 models are evaluated using 9 novel taxonomy-related text-to-image metrics and human feedback. Moreover, we pioneer the use of pairwise evaluation with GPT-4 feedback for image generation. Experimental results show that the ranking of models differs significantly from standard T2I tasks. Playground-v2 and FLUX consistently outperform across metrics and subsets and the retrieval-based approach performs poorly. These findings highlight the potential for automating the curation of structured data resources.

Paper author Paper submitter about 8 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2503.10357 in a model README.md to link it from this page.

No dataset linking this paper

Cite arxiv.org/abs/2503.10357 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2503.10357 in a Space README.md to link it from this page.