Probing Visual Language Priors in VLMs
ImageDPO Finetuned Model
This page provides the ImageDPO finetuned checkpoint for LLaVA-v1.5-13B used in Probing Visual Language Priors in VLMs. ImageDPO is a self-improving approach to enhance VLM visual reasoning performance by increasing reliance on visual inputs as illustrated in the below image. We offer the merged model weights for use.
Usage
First, install the LLaVA-v1.5 codebase.
Run the following command to have a try:
python -m llava.eval.run_llava \
--model-path ViLP/LLaVA-v1.5-13b-ImageDPO \
--image-file 'images/llava_logo.png' \
--query 'Please caption this image.' \
--conv-mode llava_v1
Citation Information
Please consider citing ViLP paper, if you find our resource helpful!
@article{luo2024probing,
title={Probing Visual Language Priors in VLMs},
author={Luo, Tiange and Cao, Ang and Lee, Gunhee and Johnson, Justin and Lee, Honglak},
journal={arXiv preprint arXiv:2501.00569},
year={2024}
}
- Downloads last month
- 11
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no library tag.