arxiv:2407.07726

PaliGemma: A versatile 3B VLM for transfer

Published on Jul 10

· Submitted by

akhaliq on Jul 11

#1 Paper of the day

Authors:

Lucas Beyer ,

,

,

,

,

,

Maxim Neumann ,

Ibrahim Alabdulmohsin ,

Michael Tschannen ,

Emanuele Bugliarello ,

,

Daniel Keysers ,

Skanda Koppula ,

Fangyu Liu ,

,

Alexey Gritsenko ,

Neil Houlsby ,

,

,

Julian Eisenschlos ,

,

Abstract

PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more specialized tasks such as remote-sensing and segmentation.

View arXiv page View PDF Add to collection

Community

akhaliq

Paper submitter Jul 11

merve

Jul 11

also read hf.co/blog/paligemma

jerpint

Jul 11

are the finetuned models going to be available on huggingface?

Jul 12

•

are the finetuned models going to be available on huggingface?

I think it is already available.
https://huggingface.co/collections/google/paligemma-release-6643a9ffbf57de2ae0448dda
https://huggingface.co/collections/google/paligemma-ft-models-6643b03efb769dad650d2dda

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Models citing this paper 115

Browse 115 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2407.07726 in a dataset README.md to link it from this page.

Spaces citing this paper 28

Collections including this paper 16