Expanding inputs for image tokens in LLaVa-NeXT should be done in processing. Please add `patch_size` and `vision_feature_select_strategy` to the model's processing config

#2
by thirdinwinter - opened

Hi there, I am running the BGE-VL-MLLM-S2 with the official sample code(the VL-MLLM-S1 sample code):

image.png

and I encountered the following warning

Expanding inputs for image tokens in LLaVa-NeXT should be done in processing. Please add patch_size and vision_feature_select_strategy to the model's processing config or set directly with processor.patch_size = {{patch_size}} and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.47

which comes from the following lines:
query_inputs = model.data_process(
text=text,
images=["img1.jpg", "img2.jpg"],
q_or_c="q",
task_instruction="Retrieve the target image that best meets the combined criteria by using both the provided image and the image retrieval instructions: "
)

Is there any way to fix it? I also find some similar solutions in https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf/discussions/34 but still do not know about the appropriate value of these two variables

Oops! sorry for my mistake, the website should be https://huggingface.co/BAAI/BGE-VL-MLLM-S1/discussions/1

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment