Edit model card

πŸ‘οΈ VideoGPT+ (Phi-3-mini-4K 3.8B)


πŸ“ Description

VideoGPT+ integrates image and video encoders to leverage detailed spatial understanding and global temporal context, respectively. It processes videos in segments using adaptive pooling on features from both encoders, enhancing performance across various video benchmarks.

This model contains VideoGPT+ checkpoints with Phi-3-Mini-4K 3.8B LLM for VCGBench, VCGBench-Diverse and MVBench benchmarks.

πŸ’» Download

To get started, follow these steps:

git lfs install
git clone https://huggingface.co/MBZUAI/VideoGPT-plus_Phi3-mini-4k

πŸ“š Additional Resources

πŸ“œ Citations and Acknowledgments

  @article{Maaz2024VideoGPT+,
      title={VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding},
      author={Maaz, Muhammad and Rasheed, Hanoona and Khan, Salman and Khan, Fahad Shahbaz},
      journal={arxiv},
      year={2024},
      url={https://arxiv.org/abs/2406.09418}
  }
Downloads last month

-

Downloads are not tracked for this model. How to track
Unable to determine this model's library. Check the docs .

Collection including MBZUAI/VideoGPT-plus_Phi3-mini-4k