mmaaz60's picture
Update README.md
faffb99 verified
metadata
license: apache-2.0

πŸ‘οΈ VideoGPT+ (Phi-3-mini-4K 3.8B - Projector Pretrain Weights)


πŸ“ Description

VideoGPT+ integrates image and video encoders to leverage detailed spatial understanding and global temporal context, respectively. It processes videos in segments using adaptive pooling on features from both encoders, enhancing performance across various video benchmarks.

This model contains the pretrained weights of projectors for Image encoder (CLIP L/14) and Video Encoder (InternVideo2).

πŸ’» Download

To get started with, follow these steps:

git lfs install
git clone https://huggingface.co/MBZUAI/VideoGPT-plus_Phi3-mini-4k_Pretrain

πŸ“š Additional Resources

πŸ“œ Citations and Acknowledgments

  @article{Maaz2024VideoGPT+,
      title={VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding},
      author={Maaz, Muhammad and Rasheed, Hanoona and Khan, Salman and Khan, Fahad Shahbaz},
      journal={arxiv},
      year={2024},
      url={https://arxiv.org/abs/2406.09418}
  }