lmms-lab/llava-onevision-qwen2-7b-ov · How to Format Video Input for SageMaker-Hosted Hugging Face Model Endpoint?

Hello Hugging Face community,

I’m currently working on deploying a Hugging Face model (specifically, the LLaVA model) on AWS SageMaker and attempting to analyze videos using this model. However, I am encountering issues with correctly formatting and sending the video input to the deployed model via the SageMaker endpoint.

The problem:

•	I have successfully deployed the LLaVA model using SageMaker.
•	I want to send video data to the model for analysis.
•	However, I am unsure about the correct data format the model expects for video input. Specifically:
•	Should I send the video as a Base64 encoded string, URL, or raw video file?
•	Is there any specific format or pre-processing step required for videos to be ingested by the model?

Additional Context:

•	I am using SageMaker to serve the model via an endpoint, and I need to know how to format the video input (either in JSON or another format) to ensure the model receives and processes it correctly.
•	I am aware that the model was trained with frames of video (e.g., 32 frames), but I want to ensure that the video data is passed in a compatible format that the model can process.

Any help or guidance on how to structure the data or format the video input for SageMaker and the Hugging Face model would be greatly appreciated!

Thank you in advance!