TinyLLaVA-Video

arXivGithub

For training data, We combine partial data from two datasets: LLaVA-Video-178K and Valley.

Stage Source #Sample
Pretrain LLaVA-Video-178K + Valley 397k
Finetune LLaVA-Video-178K 491k

Pretrain Data

We use four subsets of LLaVA-Video-178K: 0_30_s_academic_v0_1, 30_60_s_academic_v0_1, 0_30_s_youtube_v0_1, and 30_60_s_youtube_v0_1, supplemented with the filtered Video-LLaVA.

We provide cleaned annotations data, and the video data can be downloaded from LLaVA-Video-178K and Video-LLaVA.

Finetune Data

We use four subsets of LLaVA-Video-178K: 0_30_s_academic_v0_1, 30_60_s_academic_v0_1, 0_30_s_youtube_v0_1, and 30_60_s_youtube_v0_1.

We provide cleaned annotations data, and the video data can be downloaded from LLaVA-Video-178K.

Organize Data

Organize the image files and annotation files as follows in path/to/your/dataset:

dataset
β”œβ”€β”€ academic_source
β”œβ”€β”€ liwei_youtube_videos
β”œβ”€β”€ valley
β”œβ”€β”€ text_files
β”‚   β”œβ”€β”€ cleaned_video_caption.json
β”‚   β”œβ”€β”€ cleaned_video_openqa.json

Note: If there is any infringement, please contact us for removal.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.