Expanding inputs for image tokens in LLaVa-NeXT should be done in processing. Please add `patch_size` and `vision_feature_select_strategy` to the model's processing config
Hi there, I am running the BGE-VL-MLLM-S2 with the official sample code(the VL-MLLM-S1 sample code):
and I encountered the following warning
Expanding inputs for image tokens in LLaVa-NeXT should be done in processing. Please add patch_size
and vision_feature_select_strategy
to the model's processing config or set directly with processor.patch_size = {{patch_size}}
and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.47
which comes from the following lines:
query_inputs = model.data_process(
text=text,
images=["img1.jpg", "img2.jpg"],
q_or_c="q",
task_instruction="Retrieve the target image that best meets the combined criteria by using both the provided image and the image retrieval instructions: "
)
Is there any way to fix it? I also find some similar solutions in https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf/discussions/34 but still do not know about the appropriate value of these two variables
Oops! sorry for my mistake, the website should be https://huggingface.co/BAAI/BGE-VL-MLLM-S1/discussions/1