mfarre HF staff commited on
Commit
a2399b5
1 Parent(s): c924554

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -3
README.md CHANGED
@@ -1,3 +1,44 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - tomg-group-umd/cinepile
5
+ language:
6
+ - en
7
+ ---
8
+ # Model Card for Video-LLaVA - CinePile fine tune
9
+
10
+ <!-- Provide a quick summary of what the model is/does. -->
11
+
12
+ Fine-tuned model taking as bases [Video-LlaVA](https://huggingface.co/LanguageBind/Video-LLaVA-7B-hf) to evaluate its performance on CinePile.
13
+
14
+
15
+
16
+ ## Model Sources
17
+
18
+ <!-- Provide the basic links for the model. -->
19
+
20
+ - **Repository:** [Github](https://github.com/mfarre/Video-LLaVA-7B-hf-CinePile) with fine-tunning and inference notebook.
21
+ ## Uses
22
+
23
+
24
+ Although the model can answer questions based on the content, it is specifically optimized for addressing CinePile-related queries.
25
+ When the questions do not follow a CinePile-specific prompt, the inference section of the notebook is designed to refine and clean up the text produced by the model.
26
+
27
+ ## Results
28
+ Extending CinePile's Model Evaluations [arxiv](https://arxiv.org/abs/2405.08813)
29
+
30
+ | Model | Average | Character and relationship dynamics | Narrative and Plot Analysis | Setting and Technical Analysis | Temporal | Theme Exploration |
31
+ |--------------------------------|---------|-------------------------------------|-----------------------------|--------------------------------|----------|-------------------|
32
+ | Human | 73.21 | 82.92 | 75 | 73 | 75.52 | 64.93 |
33
+ | Human (authors) | 86 | 92 | 87.5 | 71.2 | 100 | 75 |
34
+ | GPT-4o | 59.72 | 64.36 | 74.08 | 54.77 | 44.91 | 67.89 |
35
+ | GPT-4 Vision | 58.81 | 63.73 | 73.43 | 52.55 | 46.22 | 65.79 |
36
+ | Gemini 1.5 Pro | 61.36 | 65.17 | 71.01 | 59.57 | 46.75 | 63.27 |
37
+ | Gemini 1.5 Flash | 57.52 | 61.91 | 69.15 | 54.86 | 41.34 | 61.22 |
38
+ | Gemini Pro Vision | 50.64 | 54.16 | 65.5 | 46.97 | 35.8 | 58.82 |
39
+ | Claude 3 (Opus) | 45.6 | 48.89 | 57.88 | 40.73 | 37.65 | 47.89 |
40
+ | **Video LlaVa - CinePile fine tune** | **44.16** | **45.26** | **45.14** | **46.93** | **32.55** | **49.47** |
41
+ | Video LLaVa | 22.51 | 23.11 | 25.92 | 20.69 | 22.38 | 22.63 |
42
+ | mPLUG-Owl | 10.57 | 10.65 | 11.04 | 9.18 | 11.89 | 15.05 |
43
+ | Video-ChatGPT | 14.55 | 16.02 | 14.83 | 15.54 | 6.88 | 18.86 |
44
+ | MovieChat | 4.61 | 4.95 | 4.29 | 5.23 | 2.48 | 4.21 |