Spaces:

SeaEval
/

SeaEval_Leaderboard

Running

binwang commited on Apr 12

Commit

b1aae63

•

1 Parent(s): a591976

head

Files changed (1) hide show

app.py CHANGED Viewed

@@ -2203,15 +2203,16 @@ block = gr.Blocks(theme='rottenlittlecreature/Moon_Goblin')
 with block:
     gr.Markdown(f"""
-    SeaEval Leaderboard. To submit, refer to the <a href="https://seaeval.github.io/" target="_blank" style="text-decoration: underline">SeaEval Website</a>.  Refer to the [SeaEval paper](https://arxiv.org/abs/2309.04766) for details on metrics, tasks and models.
     - **Number of Datasets**: > 30
     - **Number of Languages**: > 8
     - **Number of Models**: {NUM_MODELS}
     - **Mode of Evaluation**: Zero-Shot, Five-Shot
     Know Issues:
     - For base models, the output of base model is not truncated as no EOS detected. Evaluation could be affected, especially with length-aware metrics.
     The following table shows the performance of the models on the SeaEval benchmark.
     - For **Zero-shot** performance, it is the median value from 5 distinct prompts shown on the above leaderboard to mitigate the influence of random variations induced by prompts.
     - (-1) value indicates the results are ready yet.

 with block:
     gr.Markdown(f"""
+    #### SeaEval Leaderboard. To submit, refer to the <a href="https://seaeval.github.io/" target="_blank" style="text-decoration: underline">SeaEval Website</a>.  Refer to the [SeaEval paper](https://arxiv.org/abs/2309.04766) for details on metrics, tasks and models.
     - **Number of Datasets**: > 30
     - **Number of Languages**: > 8
     - **Number of Models**: {NUM_MODELS}
     - **Mode of Evaluation**: Zero-Shot, Five-Shot
     Know Issues:
     - For base models, the output of base model is not truncated as no EOS detected. Evaluation could be affected, especially with length-aware metrics.
     The following table shows the performance of the models on the SeaEval benchmark.
     - For **Zero-shot** performance, it is the median value from 5 distinct prompts shown on the above leaderboard to mitigate the influence of random variations induced by prompts.
     - (-1) value indicates the results are ready yet.