Spaces:

TIGER-Lab
/

VideoScore-Leaderboard

Running

App Files Files Community

hexuan21 commited on Aug 12, 2024

Commit

364b314

1 Parent(s): 2a645f6

update utils.py

Browse files

Files changed (2) hide show

app.py +1 -1
app_utils.py → utils.py +44 -26

app.py CHANGED Viewed

@@ -1,4 +1,4 @@
-from app_utils import *
 global data_component


1	+ from utils import *
2
3	global data_component
4

app_utils.py → utils.py RENAMED Viewed

@@ -26,42 +26,60 @@ CSV_DIR = "./VideoScore-Leaderboard/results.csv"
 COLUMN_NAMES = MODEL_INFO
 LEADERBORAD_INTRODUCTION = """# VideoScore Leaderboard
     """
 TABLE_INTRODUCTION = """
     """
 LEADERBORAD_INFO = """
-We list the information of the used datasets as follows:<br>
 """
-CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
-CITATION_BUTTON_TEXT = r"""@inproceedings{hendrycks2021measuring,
-  title={Measuring Mathematical Problem Solving With the MATH Dataset},
-  author={Hendrycks, Dan and Burns, Collin and Kadavath, Saurav and Arora, Akul and Basart, Steven and Tang, Eric and Song, Dawn and Steinhardt, Jacob},
-  booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)},
-  year={2021}
-}
-}"""
-SUBMIT_INTRODUCTION = """# Submit on Science Leaderboard Introduction
-## ⚠ Please note that you need to submit the json file with following format:
-```json
-{
-    "Model": "[NAME]",
-    "Repo": "https://huggingface.co/[MODEL_NAME]"
-    "TheoremQA": 50,
-    "MATH": 50,
-    "GSM": 50,
-    "GPQA": 50,
-    "MMLU-STEM": 50
-}
-```
-After submitting, you can click the "Refresh" button to see the updated leaderboard(it may takes few seconds).
 """

 COLUMN_NAMES = MODEL_INFO
 LEADERBORAD_INTRODUCTION = """# VideoScore Leaderboard
+    🏆 Welcome to the **VideoScore Leaderboard**! The leaderboard covers many popular text-to-video generative models and evaluates them on 4 dimensions: <br>
+    "Visual Quality", "Temporal Consistency", "Dynamic Degree", "Text-to-Video Alignment".
+    To demonstrate the performance of our VideoScore,
+    we use VideoScore to choose the best from videos with same prompt but different seeds.
+    Then we use some feature-based metrics mentioned in both <a href="https://arxiv.org/abs/2406.15252">VideoScore paper</a>
+    and <a href="https://arxiv.org/abs/2310.11440">EvalCrafter paper</a>,
+    see more info about these metrics in the second sheet "About" above.
+    <a href='https://hits.seeyoufarm.com'><img src='https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fhuggingface.co%2Fspaces%2FTIGER-Lab%2FTheoremQA-Leaderboard&count_bg=%23C7C83D&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=hits&edge_flat=false'></a>
     """
 TABLE_INTRODUCTION = """
     """
 LEADERBORAD_INFO = """
+Here is the detailed information for the used metrics. <br>
+<a href="https://arxiv.org/abs/2406.15252">VideoScore</a> and <a href="https://arxiv.org/abs/2310.11440">EvalCrafter</a> both
+conduct studies about the correlation between these feature-based metrics (like CLIP-Score and SSIM) and the human scoring on generated videos.
+Some of these metrics show a relatively good correlation but some correlates bad with human scores. <br>
+Below are the metrics for each dimension, raw score of these metrics is [0,1] and larger is better if there's no extra explanation, then scaled to [0, 100] <br>
+(1) Visual Quality = average(VQA_A, VQA_T) <br>
+VQA_A and VQA_T are both from EvalCrafter metrics suite.
+(2) Temporal Consistency = average(CLIP_Temp, Face_Consistency_Score, Warping_Error) <br>
+CLIP_Temp, Face_Consistency_Score, Warping_Error are all from EvalCrafter metrics suite.
+Warping_Error is "100*(1 - raw_result)" so that larger score indicate better performance.
+(3) Dynamic Degree = average(SSIM_dyn, MSE_dyn) <br>
+SSIM_dyn and MSE_dyn are both from VideoScore.
+SSIM_dyn is "100*(1-raw_result)" so that larger score indicate better performance.
+MSE_dyn is "100*(1-raw_results/255^2)" since the value range of pixel is 0-255 and the theoretical maximum of MSE is 255*255.
+(4) Text-to-Video Alignment = average(CLIP-Score, BLIP-BLEU) <br>
+CLIP-Scoreand BLIP-BLEU are both from EvalCrafter metrics suite.
 """
+CITATION_BUTTON_LABEL = "Copy the following snippet to cite the t2v models and the used metrics"
+CITATION_BUTTON_TEXT = r"""
 """