Update content.py
Browse files- content.py +2 -2
content.py
CHANGED
@@ -14,8 +14,8 @@ Both multilingual and language-specific LLMs are welcome in this leaderboard.
|
|
14 |
We currently evaluate models over four benchmarks:
|
15 |
|
16 |
- <a href="https://arxiv.org/abs/1803.05457" target="_blank"> AI2 Reasoning Challenge </a> (25-shot)
|
17 |
-
- <a href="https://arxiv.org/abs/1905.07830" target="_blank"> HellaSwag </a> (
|
18 |
-
- <a href="https://arxiv.org/abs/2009.03300" target="_blank"> MMLU </a> (
|
19 |
- <a href="https://arxiv.org/abs/2109.07958" target="_blank"> TruthfulQA </a> (0-shot)
|
20 |
|
21 |
The evaluation data was translated into these languages using ChatGPT (gpt-35-turbo).
|
|
|
14 |
We currently evaluate models over four benchmarks:
|
15 |
|
16 |
- <a href="https://arxiv.org/abs/1803.05457" target="_blank"> AI2 Reasoning Challenge </a> (25-shot)
|
17 |
+
- <a href="https://arxiv.org/abs/1905.07830" target="_blank"> HellaSwag </a> (0-shot)
|
18 |
+
- <a href="https://arxiv.org/abs/2009.03300" target="_blank"> MMLU </a> (25-shot)
|
19 |
- <a href="https://arxiv.org/abs/2109.07958" target="_blank"> TruthfulQA </a> (0-shot)
|
20 |
|
21 |
The evaluation data was translated into these languages using ChatGPT (gpt-35-turbo).
|