metadata

license: llama2
datasets:
  - wangyueqian/HawkEye-IT
  - wangyueqian/InternVid-G
  - OpenGVLab/VideoChat2-IT
language:
  - en
pipeline_tag: visual-question-answering

HawkEye: Training Video-Text LLMs for Grounding Text in Videos

This repo provides the checkpoint of HawkEye, and our implementation of VideoChat2.

videochat2-stage3-our_impl.pth is the chekepoint of our reproduce of VideoChat2. You can use it as an substitution of hawkeye.pth.

The difference between it and HawkEye is: not trained with data from InternVid-G.
The difference between it and the original implementation of VideoChat2 is: the visual encoder is frozen, and not trained with image data from VideoChat2-IT

For more details please refer to our paper and github.