Papers
arxiv:2407.07726

PaliGemma: A versatile 3B VLM for transfer

Published on Jul 10
· Submitted by akhaliq on Jul 11
#1 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,

Abstract

PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more specialized tasks such as remote-sensing and segmentation.

Community

Paper submitter

Screen Shot 2024-07-10 at 10.55.19 PM.png

also read hf.co/blog/paligemma

are the finetuned models going to be available on huggingface?

Sign up or log in to comment

Models citing this paper 115

Browse 115 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2407.07726 in a dataset README.md to link it from this page.

Spaces citing this paper 28

Collections including this paper 16