Probing Visual Language Priors in VLMs

ImageDPO Finetuned Model

This page provides the ImageDPO finetuned checkpoint for LLaVA-v1.5-13B used in Probing Visual Language Priors in VLMs. ImageDPO is a self-improving approach to enhance VLM visual reasoning performance by increasing reliance on visual inputs as illustrated in the below image. We offer the merged model weights for use.

ImageDPO

Usage

First, install the LLaVA-v1.5 codebase.

Run the following command to have a try:

python -m llava.eval.run_llava \
    --model-path ViLP/LLaVA-v1.5-13b-ImageDPO \
    --image-file 'images/llava_logo.png' \
    --query 'Please caption this image.' \
    --conv-mode llava_v1

Citation Information

Please consider citing ViLP paper, if you find our resource helpful!

@article{luo2024probing,
      title={Probing Visual Language Priors in VLMs},
      author={Luo, Tiange and Cao, Ang and Lee, Gunhee and Johnson, Justin and Lee, Honglak},
      journal={arXiv preprint arXiv:2501.00569},
      year={2024}
}
Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support