Zhang199/TinyLLaVA-Video-v1-training-data

TinyLLaVA-Video

For training data, We combine partial data from two datasets: LLaVA-Video-178K and Valley.

Stage	Source	#Sample
Pretrain	LLaVA-Video-178K + Valley	397k
Finetune	LLaVA-Video-178K	491k

Pretrain Data

We use four subsets of LLaVA-Video-178K: 0_30_s_academic_v0_1, 30_60_s_academic_v0_1, 0_30_s_youtube_v0_1, and 30_60_s_youtube_v0_1, supplemented with the filtered Video-LLaVA.

We provide cleaned annotations data, and the video data can be downloaded from LLaVA-Video-178K and Video-LLaVA.

Finetune Data

We use four subsets of LLaVA-Video-178K: 0_30_s_academic_v0_1, 30_60_s_academic_v0_1, 0_30_s_youtube_v0_1, and 30_60_s_youtube_v0_1.

We provide cleaned annotations data, and the video data can be downloaded from LLaVA-Video-178K.

Organize Data

Organize the image files and annotation files as follows in path/to/your/dataset:

dataset
├── academic_source
├── liwei_youtube_videos
├── valley
├── text_files
│   ├── cleaned_video_caption.json
│   ├── cleaned_video_openqa.json

Note: If there is any infringement, please contact us for removal.