π Comes in 3B, pretrained, mix and fine-tuned models in 224, 448 and 896 resolution
𧩠Combination of Gemma 2B LLM and SigLIP image encoder
π€ Supported in transformers
PaliGemma can do..
𧩠Image segmentation and detection! π€―
π Detailed document understanding and reasoning
π Visual question answering, captioning and any other VLM task!
Read our blog π hf.co/blog/paligemma
Try the demo πͺ hf.co/spaces/google/paligemma
Check out the Spaces and the models all in the collection π google/paligemma-release-6643a9ffbf57de2ae0448dda
Collection of fine-tuned PaliGemma models google/paligemma-ft-models-6643b03efb769dad650d2dda