Buckets:

weathon
/

LongVideoBench-bucket

0 Bytes

38 files

Updated 2 days ago

Ctrl+K

Name	Size	Uploaded	Xet hash
.gitattributes	2.31 kB xet	2 days ago	b6a9e0dd
README.md	10.1 kB xet	2 days ago	81543e09
lvb_test_wo_gt.json	4.89 MB xet	2 days ago	3c2fec16
lvb_val.json	1.31 MB xet	2 days ago	f28361c9
subtitles.tar	117 MB xet	2 days ago	b54e27a6
test-00000-of-00001.parquet	1.61 MB xet	2 days ago	4572414a
validation-00000-of-00001.parquet	427 kB xet	2 days ago	14b38194
videos.tar.part.aa	5.24 GB xet	2 days ago	be628e6d
videos.tar.part.ab	5.24 GB xet	2 days ago	68b57a7a
videos.tar.part.ac	5.24 GB xet	2 days ago	14cdba42
videos.tar.part.ad	5.24 GB xet	2 days ago	99cc3a22
videos.tar.part.ae	5.24 GB xet	2 days ago	9111bb62
videos.tar.part.af	5.24 GB xet	2 days ago	b9ef5c24
videos.tar.part.ag	5.24 GB xet	2 days ago	b52872d1
videos.tar.part.ah	5.24 GB xet	2 days ago	3b561878
videos.tar.part.ai	5.24 GB xet	2 days ago	507abb19
videos.tar.part.aj	5.24 GB xet	2 days ago	80ec1573
videos.tar.part.ak	5.24 GB xet	2 days ago	9935e670
videos.tar.part.al	5.24 GB xet	2 days ago	63eea8c6
videos.tar.part.am	5.24 GB xet	2 days ago	ca74faac
videos.tar.part.an	5.24 GB xet	2 days ago	6387f421
videos.tar.part.ao	5.24 GB xet	2 days ago	b21baa38
videos.tar.part.ap	5.24 GB xet	2 days ago	e079e21b
videos.tar.part.aq	5.24 GB xet	2 days ago	14d79826
videos.tar.part.ar	5.24 GB xet	2 days ago	a91f9fbf
videos.tar.part.as	5.24 GB xet	2 days ago	567ff4ec
videos.tar.part.at	5.24 GB xet	2 days ago	0a2da338
videos.tar.part.au	5.24 GB xet	2 days ago	2990b3b2
videos.tar.part.av	5.24 GB xet	2 days ago	3963a06e
videos.tar.part.aw	5.24 GB xet	2 days ago	793642e3
videos.tar.part.ax	5.24 GB xet	2 days ago	d369734c
videos.tar.part.ay	5.24 GB xet	2 days ago	012e5de2
videos.tar.part.az	5.24 GB xet	2 days ago	ce6d88cb
videos.tar.part.ba	5.24 GB xet	2 days ago	2b1402df
videos.tar.part.bb	5.24 GB xet	2 days ago	10a80fa2
videos.tar.part.bc	5.24 GB xet	2 days ago	2210b1e4
videos.tar.part.bd	5.24 GB xet	2 days ago	8a4d8089
videos.tar.part.be	4.28 GB xet	2 days ago	1c387e30

README.md

Dataset Card for LongVideoBench

Large multimodal models (LMMs) are handling increasingly longer and more complex inputs. However, few public benchmarks are available to assess these advancements. To address this, we introduce LongVideoBench, a question-answering benchmark with video-language interleaved inputs up to an hour long. It comprises 3,763 web-collected videos with subtitles across diverse themes, designed to evaluate LMMs on long-term multimodal understanding.

The main challenge that LongVideoBench targets is to accurately retrieve and reason over detailed information from lengthy inputs. We present a novel task called referring reasoning, where questions contain a referring query that references related video contexts, requiring the model to reason over these details.

LongVideoBench includes 6,678 human-annotated multiple-choice questions across 17 categories, making it one of the most comprehensive benchmarks for long-form video understanding. Evaluations show significant challenges even for advanced proprietary models (e.g., GPT-4o, Gemini-1.5-Pro, GPT-4-Turbo), with open-source models performing worse. Performance improves only when models process more frames, establishing LongVideoBench as a valuable benchmark for future long-context LMMs.

Dataset Details

Dataset Description

Curated by: LongVideoBench Team
Language(s) (NLP): English
License: CC-BY-NC-SA 4.0

Dataset Sources [optional]

Repository: https://github.com/longvideobench/LongVideoBench
Homepage: https://longvideobench.github.io
Leaderboard: https://huggingface.co/spaces/longvideobench/LongVideoBench

Leaderboard (until Oct. 14, 2024)

We rank models by Test Total Performance.

Model	Test Total (5341)	Test 8s-15s	Test 15s-60s	Test 180s-600s	Test 900s-3600s	Val Total (1337)
GPT-4o (0513) (256)	66.7	71.6	76.8	66.7	61.6	66.7
Aria (256)	65.0	69.4	76.6	64.6	60.1	64.2
LLaVA-Video-72B-Qwen2 (128)	64.9	72.4	77.4	63.9	59.3	63.9
Gemini-1.5-Pro (0514) (256)	64.4	70.2	75.3	65.0	59.1	64.0
LLaVA-OneVision-QWen2-72B-OV (32)	63.2	74.3	77.4	61.6	56.5	61.3
LLaVA-Video-7B-Qwen2 (128)	62.7	69.7	76.5	62.1	56.6	61.1
Gemini-1.5-Flash (0514) (256)	62.4	66.1	73.1	63.1	57.3	61.6
GPT-4-Turbo (0409) (256)	60.7	66.4	71.1	61.7	54.5	59.1
InternVL2-40B (16)	60.6	71.4	76.6	57.5	54.4	59.3
GPT-4o-mini (250)	58.8	66.6	73.4	56.9	53.4	56.5
MiniCPM-V-2.6 (64)	57.7	62.5	69.1	54.9	49.8	54.9
Qwen2-VL-7B (256)	56.8	60.1	67.6	56.7	52.5	55.6
Kangaroo (64)	54.8	65.6	65.7	52.7	49.1	54.2
PLLaVA-34B (32)	53.5	60.1	66.8	50.8	49.1	53.2
InternVL-Chat-V1-5-26B (16)	51.7	61.3	62.7	49.5	46.6	51.2
LLaVA-Next-Video-34B (32)	50.5	57.6	61.6	48.7	45.9	50.5
Phi-3-Vision-Instruct (16)	49.9	58.3	59.6	48.4	45.1	49.6
Idefics2 (16)	49.4	57.4	60.4	47.3	44.7	49.7
Mantis-Idefics2 (16)	47.6	56.1	61.4	44.6	42.5	47.0
LLaVA-Next-Mistral-7B (8)	47.1	53.4	57.2	46.9	42.1	49.1
PLLaVA-13B (32)	45.1	52.9	54.3	42.9	41.2	45.6
InstructBLIP-T5-XXL (8)	43.8	48.1	50.1	44.5	40.0	43.3
Mantis-BakLLaVA (16)	43.7	51.3	52.7	41.1	40.1	43.7
BLIP-2-T5-XXL (8)	43.5	46.7	47.4	44.2	40.9	42.7
LLaVA-Next-Video-M7B (32)	43.5	50.9	53.1	42.6	38.9	43.5
LLaVA-1.5-13B (8)	43.1	49.0	51.1	41.8	39.6	43.4
ShareGPT4Video (16)	41.8	46.9	50.1	40.0	38.7	39.7
VideoChat2 (Mistral-7B) (16)	41.2	49.3	49.3	39.0	37.5	39.3
LLaVA-1.5-7B (8)	40.4	45.0	47.4	40.1	37.0	40.3
mPLUG-Owl2 (8)	39.4	49.4	47.3	38.7	34.3	39.1
PLLaVA-7B (32)	39.2	45.3	47.3	38.5	35.2	40.2
VideoLLaVA (8)	37.6	43.1	44.6	36.4	34.4	39.1
VideoChat2 (Vicuna 7B) (16)	35.1	38.1	40.5	33.5	33.6	36.0

Uses

Download the dataset via Hugging Face Client:

huggingface-cli download longvideobench/LongVideoBench --repo-type dataset --local-dir LongVideoBench --local-dir-use-symlinks False

Extract from the .tar files:

cat videos.tar.part.* > videos.tar
tar -xvf videos.tar
tar -xvf subtitles.tar

Use the [LongVideoBench] dataloader to load the data from raw MP4 files and subtitles:

(a) Install the dataloader:

git clone https://github.com/LongVideoBench/LongVideoBench.git
cd LongVideoBench
pip install -e .

(b) Load the dataset in python scripts:

from longvideobench import LongVideoBenchDataset

# validation
dataset = LongVideoBenchDataset(YOUR_DATA_PATH, "lvb_val.json", max_num_frames=64)

# test
dataset = LongVideoBenchDataset(YOUR_DATA_PATH, "lvb_test_wo_gt.json", max_num_frames=64)

print(dataset[0]["inputs"]) # A list consisting of PIL.Image and strings.

The "inputs" are interleaved video frames and text subtitles, followed by questions and option prompts. You can then convert them to the format that your LMMs can accept.

Direct Use

This dataset is meant to evaluate LMMs on video understanding and long-context understanding abilities.

Out-of-Scope Use

We do not advise to use this dataset for training.

Dataset Structure

lvb_val.json: Validation set annotations.
lvb_test_wo_gt.json: Test set annotations. Correct choice is not provided.
videos.tar.*: Links to Videos.
subtitles.tar: Links to Subtitles.

Dataset Card Contact

haoning001@e.ntu.edu.sg

@misc{wu2024longvideobenchbenchmarklongcontextinterleaved,
      title={LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding}, 
      author={Haoning Wu and Dongxu Li and Bei Chen and Junnan Li},
      year={2024},
      eprint={2407.15754},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2407.15754}, 
}

Total size: 0 Bytes

Files: 38

Last updated: Jun 15

Pre-warmed CDN: US EU US EU