binwang commited on
Commit
b1aae63
1 Parent(s): a591976
Files changed (1) hide show
  1. app.py +3 -2
app.py CHANGED
@@ -2203,15 +2203,16 @@ block = gr.Blocks(theme='rottenlittlecreature/Moon_Goblin')
2203
 
2204
 
2205
  with block:
2206
-
2207
  gr.Markdown(f"""
2208
- SeaEval Leaderboard. To submit, refer to the <a href="https://seaeval.github.io/" target="_blank" style="text-decoration: underline">SeaEval Website</a>. Refer to the [SeaEval paper](https://arxiv.org/abs/2309.04766) for details on metrics, tasks and models.
2209
  - **Number of Datasets**: > 30
2210
  - **Number of Languages**: > 8
2211
  - **Number of Models**: {NUM_MODELS}
2212
  - **Mode of Evaluation**: Zero-Shot, Five-Shot
 
2213
  Know Issues:
2214
  - For base models, the output of base model is not truncated as no EOS detected. Evaluation could be affected, especially with length-aware metrics.
 
2215
  The following table shows the performance of the models on the SeaEval benchmark.
2216
  - For **Zero-shot** performance, it is the median value from 5 distinct prompts shown on the above leaderboard to mitigate the influence of random variations induced by prompts.
2217
  - (-1) value indicates the results are ready yet.
 
2203
 
2204
 
2205
  with block:
 
2206
  gr.Markdown(f"""
2207
+ #### SeaEval Leaderboard. To submit, refer to the <a href="https://seaeval.github.io/" target="_blank" style="text-decoration: underline">SeaEval Website</a>. Refer to the [SeaEval paper](https://arxiv.org/abs/2309.04766) for details on metrics, tasks and models.
2208
  - **Number of Datasets**: > 30
2209
  - **Number of Languages**: > 8
2210
  - **Number of Models**: {NUM_MODELS}
2211
  - **Mode of Evaluation**: Zero-Shot, Five-Shot
2212
+
2213
  Know Issues:
2214
  - For base models, the output of base model is not truncated as no EOS detected. Evaluation could be affected, especially with length-aware metrics.
2215
+
2216
  The following table shows the performance of the models on the SeaEval benchmark.
2217
  - For **Zero-shot** performance, it is the median value from 5 distinct prompts shown on the above leaderboard to mitigate the influence of random variations induced by prompts.
2218
  - (-1) value indicates the results are ready yet.