Spaces:

AILab-CVC
/

SEED-Bench_Leaderboard

Running

tttoaster commited on Dec 4, 2023

Commit

d093c93

•

1 Parent(s): 8b3d9a9

Update constants.py

Files changed (1) hide show

constants.py CHANGED Viewed

@@ -80,12 +80,19 @@ SUBMIT_INTRODUCTION = """# Submit on SEED Benchmark Introduction
 TABLE_INTRODUCTION = """In the table below, we summarize each task performance of all the models.
         We use accurancy(%) as the primary evaluation metric for each tasks.
         SEED-Bench-1 calculates the overall accuracy by dividing the total number of correct QA answers by the total number of QA questions.
         SEED-Bench-2 represents the overall accuracy using the average accuracy of each dimension.
         For PPL evaluation method, we count the loss for each candidate and select the lowest loss candidate. For detail, please refer [InternLM_Xcomposer_VL_interface](https://github.com/AILab-CVC/SEED-Bench/blob/387a067b6ba99ae5e8231f39ae2d2e453765765c/SEED-Bench-2/model/InternLM_Xcomposer_VL_interface.py#L74).
         For PPL A/B/C/D evaluation method, please refer [EVAL_SEED.md](https://github.com/QwenLM/Qwen-VL/blob/master/eval_mm/seed_bench/EVAL_SEED.md) for more information.
         For Generate evaluation method, please refer [Evaluation.md](https://github.com/haotian-liu/LLaVA/blob/main/docs/Evaluation.md#seed-bench) for detailed.
         For the NG evaluation method, we indicate that the evaluation method is Not Given.
         If you have any questions, please feel free to contact us.
     """

 TABLE_INTRODUCTION = """In the table below, we summarize each task performance of all the models.
         We use accurancy(%) as the primary evaluation metric for each tasks.
         SEED-Bench-1 calculates the overall accuracy by dividing the total number of correct QA answers by the total number of QA questions.
         SEED-Bench-2 represents the overall accuracy using the average accuracy of each dimension.
         For PPL evaluation method, we count the loss for each candidate and select the lowest loss candidate. For detail, please refer [InternLM_Xcomposer_VL_interface](https://github.com/AILab-CVC/SEED-Bench/blob/387a067b6ba99ae5e8231f39ae2d2e453765765c/SEED-Bench-2/model/InternLM_Xcomposer_VL_interface.py#L74).
         For PPL A/B/C/D evaluation method, please refer [EVAL_SEED.md](https://github.com/QwenLM/Qwen-VL/blob/master/eval_mm/seed_bench/EVAL_SEED.md) for more information.
         For Generate evaluation method, please refer [Evaluation.md](https://github.com/haotian-liu/LLaVA/blob/main/docs/Evaluation.md#seed-bench) for detailed.
         For the NG evaluation method, we indicate that the evaluation method is Not Given.
         If you have any questions, please feel free to contact us.
     """