hexuan21 commited on
Commit
6317426
·
verified ·
1 Parent(s): 9453e50

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -20
README.md CHANGED
@@ -33,26 +33,27 @@ For the first two benchmarks, we take Spearman corrleation between model's outpu
33
  averaged among all the evaluation aspects as indicator.
34
  For GenAI-Bench and VBench, which include human preference data among two or more videos,
35
  we employ the model's output to predict preferences and use pairwise accuracy as the performance indicator.
36
- | metric | Final Sum Score | VideoEval-test | EvalCrafter | GenAI-Bench | VBench |
37
- |-------------------|:---------------:|:--------------:|:-----------:|:-----------:|:------:|
38
- | MantisScore (reg) | 278.3 | 75.7 | 51.1 | 78.5 | 73.0 |
39
- | MantisScore (gen) | 222.4 | 77.1 | 27.6 | 59.0 | 58.7 |
40
- | Gemini-1.5-Pro | 158.8 | 22.1 | 22.9 | 60.9 | 52.9 |
41
- | Gemini-1.5-Flash | 157.5 | 20.8 | 17.3 | 67.1 | 52.3 |
42
- | GPT-4o | 155.4 | 23.1 | 28.7 | 52.0 | 51.7 |
43
- | CLIP-sim | 126.8 | 8.9 | 36.2 | 34.2 | 47.4 |
44
- | DINO-sim | 121.3 | 7.5 | 32.1 | 38.5 | 43.3 |
45
- | SSIM-sim | 118.0 | 13.4 | 26.9 | 34.1 | 43.5 |
46
- | CLIP-Score | 114.4 | -7.2 | 21.7 | 45.0 | 54.9 |
47
- | LLaVA-1.5-7B | 108.3 | 8.5 | 10.5 | 49.9 | 39.4 |
48
- | LLaVA-1.6-7B | 93.3 | -3.1 | 13.2 | 44.5 | 38.7 |
49
- | X-CLIP-Score | 92.9 | -1.9 | 13.3 | 41.4 | 40.1 |
50
- | PIQE | 78.3 | -10.1 | -1.2 | 34.5 | 55.1 |
51
- | BRISQUE | 75.9 | -20.3 | 3.9 | 38.5 | 53.7 |
52
- | Idefics2 | 73.0 | 6.5 | 0.3 | 34.6 | 31.7 |
53
- | SSIM-dyn | 42.5 | -5.5 | -17.0 | 28.4 | 36.5 |
54
- | MES-dyn | 36.7 | -12.9 | -26.4 | 31.4 | 44.5 |
55
-
 
56
 
57
  ## Usage
58
  ### Installation
 
33
  averaged among all the evaluation aspects as indicator.
34
  For GenAI-Bench and VBench, which include human preference data among two or more videos,
35
  we employ the model's output to predict preferences and use pairwise accuracy as the performance indicator.
36
+ | metric | Final Sum Score | VideoEval-test | EvalCrafter | GenAI-Bench | VBench |
37
+ |-------------------|:---------------:|:--------------:|:-----------:|:-----------:|:----------:|
38
+ | MantisScore (reg) | **278.3** | 75.7 | **51.1** | **78.5** | **73.0** |
39
+ | MantisScore (gen) | 222.4 | **77.1** | 27.6 | 59.0 | 58.7 |
40
+ | Gemini-1.5-Pro | <u>158.8</u> | 22.1 | 22.9 | 60.9 | 52.9 |
41
+ | Gemini-1.5-Flash | 157.5 | 20.8 | 17.3 | <u>67.1</u> | 52.3 |
42
+ | GPT-4o | 155.4 | <u>23.1</u> | 28.7 | 52.0 | 51.7 |
43
+ | CLIP-sim | 126.8 | 8.9 | <u>36.2</u> | 34.2 | 47.4 |
44
+ | DINO-sim | 121.3 | 7.5 | 32.1 | 38.5 | 43.3 |
45
+ | SSIM-sim | 118.0 | 13.4 | 26.9 | 34.1 | 43.5 |
46
+ | CLIP-Score | 114.4 | -7.2 | 21.7 | 45.0 | 54.9 |
47
+ | LLaVA-1.5-7B | 108.3 | 8.5 | 10.5 | 49.9 | 39.4 |
48
+ | LLaVA-1.6-7B | 93.3 | -3.1 | 13.2 | 44.5 | 38.7 |
49
+ | X-CLIP-Score | 92.9 | -1.9 | 13.3 | 41.4 | 40.1 |
50
+ | PIQE | 78.3 | -10.1 | -1.2 | 34.5 |<u> 55.1</u>|
51
+ | BRISQUE | 75.9 | -20.3 | 3.9 | 38.5 | 53.7 |
52
+ | Idefics2 | 73.0 | 6.5 | 0.3 | 34.6 | 31.7 |
53
+ | SSIM-dyn | 42.5 | -5.5 | -17.0 | 28.4 | 36.5 |
54
+ | MES-dyn | 36.7 | -12.9 | -26.4 | 31.4 | 44.5 |
55
+
56
+ The best in MantisScore series is in bold and the best in baselines is underlined.
57
 
58
  ## Usage
59
  ### Installation