Update model card
Browse files
README.md
CHANGED
@@ -1,3 +1,39 @@
|
|
1 |
-
---
|
2 |
-
license: bsd-3-clause
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: bsd-3-clause
|
3 |
+
---
|
4 |
+
|
5 |
+
# E.T. Chat
|
6 |
+
|
7 |
+
[arXiv](https://arxiv.org/abs/2409.18111) | [Project Page](https://polyu-chenlab.github.io/etbench) | [GitHub](https://github.com/PolyU-ChenLab/ETBench)
|
8 |
+
|
9 |
+
E.T. Chat is a novel time-sensitive Video-LLM that reformulates timestamp prediction as an embedding matching problem, serving as a strong baseline on E.T. Bench. E.T. Chat consists of a visual encoder, a frame compressor, and a LLM. A special token \<vid\> is introduced to trigger frame embedding matching for timestamp prediction.
|
10 |
+
|
11 |
+
## 🔖 Model Details
|
12 |
+
|
13 |
+
### Model Description
|
14 |
+
|
15 |
+
- **Developed by:** Ye Liu
|
16 |
+
- **Model type:** Multi-modal Large Language Model
|
17 |
+
- **Language(s):** English
|
18 |
+
- **License:** BSD-3-Clause
|
19 |
+
|
20 |
+
### Training Data
|
21 |
+
|
22 |
+
The stage-3 checkpoint of E.T. Chat was trained from [ET-Instruct-164K](https://huggingface.co/datasets/PolyU-ChenLab/ET-Instruct-164K) dataset.
|
23 |
+
|
24 |
+
### More Details
|
25 |
+
|
26 |
+
Please refer to our [GitHub Repository](https://github.com/PolyU-ChenLab/ETBench) for more details about this model.
|
27 |
+
|
28 |
+
## 📖 Citation
|
29 |
+
|
30 |
+
Please kindly cite our paper if you find this project helpful.
|
31 |
+
|
32 |
+
```
|
33 |
+
@inproceedings{liu2024etbench,
|
34 |
+
title={E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding},
|
35 |
+
author={Liu, Ye and Ma, Zongyang and Qi, Zhongang and Wu, Yang and Chen, Chang Wen and Shan, Ying},
|
36 |
+
booktitle={Neural Information Processing Systems (NeurIPS)},
|
37 |
+
year={2024}
|
38 |
+
}
|
39 |
+
```
|