Video-CCAM
Collection
A lightweight Video-MLLM.
•
3 items
•
Updated
Video-CCAM-4B is a lightweight Video-MLLM built on Phi-3-mini-4k-instruct and SigLIP SO400M. Note: Here Phi-3-mini-4k-instruct refers to the previous version, which requires git commit id ff07dc01615f8113924aed013115ab2abd32115b
to get the checkpoint.
Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.10:
torch==2.1.0
torchvision==0.16.0
transformers==4.40.2
peft==0.10.0
Please refer to Video-CCAM on inference and evaluation.
#Frames. | 32 | 96 |
---|---|---|
w/o subs | 48.2 | 49.6 |
w subs | 51.7 | 53.0 |
The model is licensed under the MIT license.