Visual Question Answering
English
wangyueqian commited on
Commit
54cb1ac
1 Parent(s): 8a97bcf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -0
README.md CHANGED
@@ -1,3 +1,19 @@
1
  ---
2
  license: llama2
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: llama2
3
+ datasets:
4
+ - wangyueqian/HawkEye-IT
5
+ - wangyueqian/InternVid-G
6
+ - OpenGVLab/VideoChat2-IT
7
+ language:
8
+ - en
9
+ pipeline_tag: visual-question-answering
10
  ---
11
+ # <div style="display: flex; align-items: center;"> <img src="https://github.com/yellow-binary-tree/HawkEye/blob/main/assets/hawk.png?raw=True" alt="logo" width="50" height="50" style="margin: 0 10;"> <span style="margin: 10 10;">&emsp;HawkEye: Training Video-Text LLMs for Grounding Text in Videos</span> </div>
12
+
13
+ This repo provides the checkpoint of HawkEye, and our implementation of VideoChat2.
14
+
15
+ `videochat2-stage3-our_impl.pth` is the chekepoint of our reproduce of VideoChat2. You can use it as an substitution of `hawkeye.pth`.
16
+ - The difference between it and HawkEye is: not trained with data from [InternVid-G](https://github.com/yellow-binary-tree/HawkEye/blob/main/internvid_g/README.md).
17
+ - The difference between it and the original implementation of VideoChat2 is: the visual encoder is frozen, and not trained with image data from [VideoChat2-IT](https://github.com/OpenGVLab/Ask-Anything/blob/main/video_chat2/DATA.md)
18
+
19
+ For more details please refer to our [paper](https://arxiv.org/abs/2403.10228) and [github](https://github.com/yellow-binary-tree/HawkEye).