metadata

datasets:
  - tomg-group-umd/cinepile
language:
  - en
license: apache-2.0

Model Card for Video-LLaVA - CinePile fine tune

Fine-tuned model taking as bases Video-LlaVA to evaluate its performance on CinePile.

Model Sources

Repository: Github with fine-tunning and inference notebook.

Uses

Although the model can answer questions based on the content, it is specifically optimized for addressing CinePile-related queries. When the questions do not follow a CinePile-specific prompt, the inference section of the notebook is designed to refine and clean up the text produced by the model.

Results

Extending CinePile's Model Evaluations arxiv

Model	Average	Character and relationship dynamics	Narrative and Plot Analysis	Setting and Technical Analysis	Temporal	Theme Exploration
Human	73.21	82.92	75	73	75.52	64.93
Human (authors)	86	92	87.5	71.2	100	75
GPT-4o	59.72	64.36	74.08	54.77	44.91	67.89
GPT-4 Vision	58.81	63.73	73.43	52.55	46.22	65.79
Gemini 1.5 Pro	61.36	65.17	71.01	59.57	46.75	63.27
Gemini 1.5 Flash	57.52	61.91	69.15	54.86	41.34	61.22
Gemini Pro Vision	50.64	54.16	65.5	46.97	35.8	58.82
Claude 3 (Opus)	45.6	48.89	57.88	40.73	37.65	47.89
Video LlaVa - CinePile fine tune	44.16	45.26	45.14	46.93	32.55	49.47
Video LLaVa	22.51	23.11	25.92	20.69	22.38	22.63
mPLUG-Owl	10.57	10.65	11.04	9.18	11.89	15.05
Video-ChatGPT	14.55	16.02	14.83	15.54	6.88	18.86
MovieChat	4.61	4.95	4.29	5.23	2.48	4.21