HY-Wan commited on
Commit
17eb8eb
·
verified ·
1 Parent(s): ea9e96f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +129 -0
README.md ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ task_categories:
4
+ - video-classification
5
+ - reinforcement-learning
6
+ - robotics
7
+ language:
8
+ - en
9
+ tags:
10
+ - games
11
+ - maze
12
+ - sokoban
13
+ - 3d-navigation
14
+ - multimodal
15
+ - video
16
+ - planning
17
+ size_categories:
18
+ - 10K<n<100K
19
+ ---
20
+
21
+ # VR-Bench: A Multimodal Video Reasoning Benchmark
22
+
23
+ ## Dataset Description
24
+
25
+ This is a multimodal dataset containing video demonstrations of game-playing scenarios across different game types including mazes, 3D mazes, Sokoban puzzles, and trap fields. The dataset is designed for training AI models on visual reasoning, planning, and sequential decision-making tasks.
26
+
27
+ ## Dataset Structure
28
+
29
+ The dataset is organized into three main directories:
30
+
31
+ - `train_data/`: Training data with subdirectories for each game type and difficulty level
32
+ - `test_data/`: Test data with the same structure as training data
33
+ - `test_data_merge/`: Merged test data organized by game type (without difficulty separation)
34
+
35
+ ### Game Types
36
+
37
+ 1. **Maze**: Classic 2D maze navigation
38
+ 2. **Irregular Maze**: Non-standard maze layouts
39
+ 3. **Maze3D**: Three-dimensional maze navigation
40
+ 4. **Sokoban**: Box-pushing puzzle game
41
+ 5. **Trapfield**: Navigation with obstacles and traps
42
+
43
+ ### Difficulty Levels
44
+
45
+ Each game type has three difficulty levels:
46
+ - `easy`: Simple layouts with shorter solution paths
47
+ - `medium`: Moderate complexity
48
+ - `hard`: Complex layouts requiring advanced planning
49
+
50
+ ## File Format
51
+
52
+ Each data sample consists of:
53
+ - **Video file** (`.mp4`): Demonstration of gameplay
54
+ - **Image file** (`.png`): Initial state screenshot
55
+ - **JSON file** (`.json`): Game state metadata including:
56
+ - Grid layout and dimensions
57
+ - Entity positions (player, goal, boxes)
58
+ - Bounding box information
59
+ - Render parameters
60
+
61
+ ### JSON Structure
62
+
63
+ ```json
64
+ {
65
+ "version": "1.0",
66
+ "game_type": "maze",
67
+ "entities": {
68
+ "player": {
69
+ "pixel_pos": {"x": 165, "y": 45},
70
+ "bbox": {"x": 150, "y": 30, "width": 30, "height": 30},
71
+ "grid_pos": {"row": 1, "col": 5}
72
+ },
73
+ "goal": {
74
+ "pixel_pos": {"x": 105, "y": 165},
75
+ "bbox": {"x": 90, "y": 150, "width": 30, "height": 30},
76
+ "grid_pos": {"row": 5, "col": 3}
77
+ }
78
+ },
79
+ "grid": {
80
+ "data": [[1,1,1,...], [1,0,0,...], ...],
81
+ "height": 7,
82
+ "width": 7
83
+ },
84
+ "render": {
85
+ "cell_size": 30,
86
+ "image_width": 210,
87
+ "image_height": 210
88
+ }
89
+ }
90
+ ```
91
+
92
+ ### Metadata CSV
93
+
94
+ Each subdirectory contains a `metadata.csv` file with columns:
95
+ - `video`: Video filename
96
+ - `prompt`: Associated text prompt (currently empty)
97
+ - `input_image`: Initial state image filename
98
+
99
+ ## Usage
100
+
101
+ This dataset can be used for:
102
+ - **Visual Planning**: Learning to plan sequences of actions from visual input
103
+ - **Multimodal Learning**: Combining video, image, and structured data
104
+ - **Reinforcement Learning**: Training agents on game environments
105
+ - **Video Understanding**: Learning temporal patterns in sequential decision-making
106
+
107
+ ## Dataset Statistics
108
+
109
+ - **Total Games**: 5 game types
110
+ - **Difficulty Levels**: 3 per game type
111
+ - **Data Splits**: Training and test sets
112
+ - **File Types**: Video (.mp4), Images (.png), Metadata (.json), Index (.csv)
113
+
114
+ ## Citation
115
+
116
+ If you use this dataset in your research, please cite:
117
+
118
+ ```bibtex
119
+ @dataset{vr_bench_2025,
120
+ title={VR-Bench: A Multimodal Video Reasoning Benchmark},
121
+ author={[Author Name]},
122
+ year={2025},
123
+ url={https://huggingface.co/datasets/[username]/VR-Bench}
124
+ }
125
+ ```
126
+
127
+ ## License
128
+
129
+ This dataset is released under the MIT License.