wangyueqian
/

HawkEye

Visual Question Answering Transformers

English Inference Endpoints

Model card Files Files and versions Community

Edit model card

HawkEye: Training Video-Text LLMs for Grounding Text in Videos

This repo provides the checkpoint of HawkEye, and our implementation of VideoChat2.

videochat2-stage3-our_impl.pth is the chekepoint of our reproduce of VideoChat2. You can use it as an substitution of hawkeye.pth.

The difference between it and HawkEye is: not trained with data from InternVid-G.
The difference between it and the original implementation of VideoChat2 is: the visual encoder is frozen, and not trained with image data from VideoChat2-IT

For more details please refer to our paper and github.

Downloads last month: 21

Datasets used to train wangyueqian/HawkEye