Spaces:
Runtime error
Runtime error
Neubla-MLteam
commited on
Commit
β’
73dcc35
1
Parent(s):
7597bf1
Update src/display/about.py
Browse files- src/display/about.py +1 -1
src/display/about.py
CHANGED
@@ -15,7 +15,7 @@ With the plethora of large language models (LLMs) and chatbots being released we
|
|
15 |
|
16 |
## How it works
|
17 |
|
18 |
-
π We evaluate models on
|
19 |
|
20 |
- <a href="https://arxiv.org/abs/1803.05457" target="_blank"> AI2 Reasoning Challenge </a> (25-shot) - a set of grade-school science questions.
|
21 |
- <a href="https://arxiv.org/abs/1905.07830" target="_blank"> HellaSwag </a> (10-shot) - a test of commonsense inference, which is easy for humans (~95%) but challenging for SOTA models.
|
|
|
15 |
|
16 |
## How it works
|
17 |
|
18 |
+
π We evaluate models on 6 key benchmarks using the <a href="https://github.com/EleutherAI/lm-evaluation-harness" target="_blank"> Eleuther AI Language Model Evaluation Harness </a>, a unified framework to test generative language models on a large number of different evaluation tasks.
|
19 |
|
20 |
- <a href="https://arxiv.org/abs/1803.05457" target="_blank"> AI2 Reasoning Challenge </a> (25-shot) - a set of grade-school science questions.
|
21 |
- <a href="https://arxiv.org/abs/1905.07830" target="_blank"> HellaSwag </a> (10-shot) - a test of commonsense inference, which is easy for humans (~95%) but challenging for SOTA models.
|