Can instructBlip process videos
I recently looked at the source of the blip2_vicuna-instruct7b on Salesforce/LAVIS repository and found a code for handling videos. I don't know if this is in the hugging face instructBlip model. So I'm asking if instructBlip can handle videos and if yes, how do I go about it?
Hi,
Thanks for your interest in InstructBLIP. Support for videos is not yet present in the Transformers library. Did the authors release any checkpoints trained on video?
I'm unaware of that currently. I'd check to see if there is. What I saw was just a code line for handling videos with low frame count.
also interested in processing videos
Can you share the snippet for handling videos from the original authors? That can be probably adapted a bit to use transformers model
Hi,
I'm trying to run the demo from the page https://huggingface.co/docs/transformers/main/en/model_doc/instructblip#transformers.InstructBlipForConditionalGeneration at the end and the model des not generate text instead is give this :
Loading checkpoint shards: 100%|ββββββββββ| 4/4 [00:24<00:00, 6.10s/it]
/home/tanya.kaintura/Project/myenv/lib/python3.8/site-packages/transformers/generation/configuration_utils.py:412: UserWarning: do_sample
is set to False
. However, top_p
is set to 0.9
-- this flag is only used in sample-based generation modes. You should set do_sample=True
or unset top_p
.
warnings.warn(
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Hi,
For videos I recommend taking a look at VideoBLIP: https://huggingface.co/models?other=video-to-text
PR is open for it now here: https://github.com/huggingface/transformers/pull/30182