Spaces:
Sleeping
Sleeping
Dataset Preparation
Stage2——Video-language Alignment
Pretraining
The public portion of the pre-trained dataset we use is as follows:
Evaluation
For evaluation, we follow VINDLU to prepare the datasets, but we DO NOT compress the videos and images. We use the original data and load the JSON files. And We use the same JSON files provided by VINDLU.
Video-Text Retrieval
Stage3——VideoChat
Pretraining
Evaluation
MVBench
Please refer to MVBench