Spaces:
Running
Running
head
Browse files
app.py
CHANGED
@@ -2203,15 +2203,16 @@ block = gr.Blocks(theme='rottenlittlecreature/Moon_Goblin')
|
|
2203 |
|
2204 |
|
2205 |
with block:
|
2206 |
-
|
2207 |
gr.Markdown(f"""
|
2208 |
-
SeaEval Leaderboard. To submit, refer to the <a href="https://seaeval.github.io/" target="_blank" style="text-decoration: underline">SeaEval Website</a>. Refer to the [SeaEval paper](https://arxiv.org/abs/2309.04766) for details on metrics, tasks and models.
|
2209 |
- **Number of Datasets**: > 30
|
2210 |
- **Number of Languages**: > 8
|
2211 |
- **Number of Models**: {NUM_MODELS}
|
2212 |
- **Mode of Evaluation**: Zero-Shot, Five-Shot
|
|
|
2213 |
Know Issues:
|
2214 |
- For base models, the output of base model is not truncated as no EOS detected. Evaluation could be affected, especially with length-aware metrics.
|
|
|
2215 |
The following table shows the performance of the models on the SeaEval benchmark.
|
2216 |
- For **Zero-shot** performance, it is the median value from 5 distinct prompts shown on the above leaderboard to mitigate the influence of random variations induced by prompts.
|
2217 |
- (-1) value indicates the results are ready yet.
|
|
|
2203 |
|
2204 |
|
2205 |
with block:
|
|
|
2206 |
gr.Markdown(f"""
|
2207 |
+
#### SeaEval Leaderboard. To submit, refer to the <a href="https://seaeval.github.io/" target="_blank" style="text-decoration: underline">SeaEval Website</a>. Refer to the [SeaEval paper](https://arxiv.org/abs/2309.04766) for details on metrics, tasks and models.
|
2208 |
- **Number of Datasets**: > 30
|
2209 |
- **Number of Languages**: > 8
|
2210 |
- **Number of Models**: {NUM_MODELS}
|
2211 |
- **Mode of Evaluation**: Zero-Shot, Five-Shot
|
2212 |
+
|
2213 |
Know Issues:
|
2214 |
- For base models, the output of base model is not truncated as no EOS detected. Evaluation could be affected, especially with length-aware metrics.
|
2215 |
+
|
2216 |
The following table shows the performance of the models on the SeaEval benchmark.
|
2217 |
- For **Zero-shot** performance, it is the median value from 5 distinct prompts shown on the above leaderboard to mitigate the influence of random variations induced by prompts.
|
2218 |
- (-1) value indicates the results are ready yet.
|