arxiv:2407.19795

VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks

Published on Jul 29

· Submitted by

c-juhwan on Jul 30

Authors:

Juhwan Choi ,

,

,

,

Abstract

Domain generalizability is a crucial aspect of a deep learning model since it determines the capability of the model to perform well on data from unseen domains. However, research on the domain generalizability of deep learning models for vision-language tasks remains limited, primarily because of the lack of required datasets. To address these challenges, we propose VolDoGer: Vision-Language Dataset for Domain Generalization, a dedicated dataset designed for domain generalization that addresses three vision-language tasks: image captioning, visual question answering, and visual entailment. We constructed VolDoGer by extending LLM-based data annotation techniques to vision-language tasks, thereby alleviating the burden of recruiting human annotators. We evaluated the domain generalizability of various models, ranging from fine-tuned models to a recent multimodal large language model, through VolDoGer.

View arXiv page View PDF Add to collection

Community

Paper author Paper submitter Jul 30

A paper about LLM-based data annotation, especially in multimodal setup.

·

nielsr

Aug 3

Hi @c-juhwan congrats on your work!

Are you planning on sharing the VolDoGer dataset on the hub? See here for a guide: https://huggingface.co/docs/datasets/loading.

It can then also be linked to this paper so that people are able to discover it: https://huggingface.co/docs/hub/en/paper-pages#linking-a-paper-to-a-model-dataset-or-space

Let me know if you need any help!

Cheers,
Niels
Open-source @ HF

Jul 31

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2407.19795 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2407.19795 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2407.19795 in a Space README.md to link it from this page.

Collections including this paper 2