Video Inference - TypeError: process_vision_info() got an unexpected keyword argument 'return_video_kwargs'
This is the piece of code i am trying to execute
messages = [
{
"role": "user",
"content": [
{
"type": "video",
"video": image_path,
"max_pixels": 360 * 420,
"fps": 1.0,
},
{"type": "text", "text": query},
],
}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
print("Video Text ",text)
image_inputs, video_inputs, video_kwargs = process_vision_info(messages, return_video_kwargs=True)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
fps=fps,
padding=True,
return_tensors="pt",
**video_kwargs,
)
inputs = inputs.to("cuda")
Output
Video Text <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
<|vision_start|><|video_pad|><|vision_end|>Describe<|im_end|>
<|im_start|>assistant
ERROR - thrown in the line image_inputs, video_inputs, video_kwargs = process_vision_info(messages, return_video_kwargs=True)
TypeError: process_vision_info() got an unexpected keyword argument 'return_video_kwargs'
Also running into the same issue.
+1
pip install qwen-vl-utils==0.0.10
fixes the issue.