Add Video Llava to transformers

#5
by RaushanTurganbay HF staff - opened

Hello!

We have received a request to add video-Llava to the library. Would you be interested in contributing?

I'm very interested, but I'm afraid I don't have enough bandwidth at the moment. Can you help me?

Great, sounds good for me! I am more than happy to help. What I would suggest is for me to take care of integrating model code in transformers style and writing the tests, as these are the most time consuming parts and you are out of bandwidth. It would be awesome if you could take care of model weights, model card on the hub and add any tips on getting maximum performance of Video-Llava.

What do you think?

This is excellent because video-llava follows the llava code style. And llava has added huggingface. I think you can easily integrate model code in transformers style soon. Btw, what should i do now, is there any guide?

Cool, I just looked through the code, not going into details. I'll start working on that today and tag you in draft PR later this week. We can discuss further questions there.

For now I suggest you to upload model weight in safetensors format. Also, we need to update the configs on the hub, to match the transformers style (see Llava config as an example). Right now we are missing "processor_config.json" and some others need to be refined. For the class names, feel free to add smth like "VideoLlavaProcessor" etc. :)

@RaushanTurganbay if it's added into the transformers library, does that mean it'll be easily compilable to onnx format?

@YungGump , VideoLLaVa in very similar to the Llava models and the Llava is not onnx compatible (see this feature request). I think this model will be compatible after Llava

@RaushanTurganbay Alright, that's fine. Looking forward for it to be in the transformer library and being easily finetunable with PEFT

Video-LLaVa was added to transformers, feel free to check it up here

Sign up or log in to comment