jaspercatapang
commited on
Commit
•
4f7504f
1
Parent(s):
4c1ddbe
Update README.md
Browse files
README.md
CHANGED
@@ -32,8 +32,10 @@ According to the leaderboard description, here are the benchmarks used for the e
|
|
32 |
- [HellaSwag](https://arxiv.org/abs/1905.07830) (10-shot) - a test of commonsense inference, which is easy for humans (~95%) but challenging for SOTA models.
|
33 |
- [TruthfulQA](https://arxiv.org/abs/2109.07958) (0-shot) - a test to measure a model’s propensity to reproduce falsehoods commonly found online.
|
34 |
|
|
|
|
|
35 |
## Leaderboard Highlights (as of August 17, 2023)
|
36 |
-
- Godzilla 2 70B
|
37 |
- Godzilla 2 70B ranks #3 in the ARC challenge.
|
38 |
- Godzilla 2 70B ranks #5 in the TruthfulQA benchmark.
|
39 |
- *Godzilla 2 70B beats GPT-3.5 (ChatGPT) in terms of average performance and the HellaSwag benchmark (87.53 > 85.5).
|
|
|
32 |
- [HellaSwag](https://arxiv.org/abs/1905.07830) (10-shot) - a test of commonsense inference, which is easy for humans (~95%) but challenging for SOTA models.
|
33 |
- [TruthfulQA](https://arxiv.org/abs/2109.07958) (0-shot) - a test to measure a model’s propensity to reproduce falsehoods commonly found online.
|
34 |
|
35 |
+
A detailed breakdown of the evaluation can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MayaPH__GodziLLa2-70B). Huge thanks to [@thomwolf](https://huggingface.co/thomwolf).
|
36 |
+
|
37 |
## Leaderboard Highlights (as of August 17, 2023)
|
38 |
+
- Godzilla 2 70B debuts at 4th place worldwide in the Open LLM Leaderboard.
|
39 |
- Godzilla 2 70B ranks #3 in the ARC challenge.
|
40 |
- Godzilla 2 70B ranks #5 in the TruthfulQA benchmark.
|
41 |
- *Godzilla 2 70B beats GPT-3.5 (ChatGPT) in terms of average performance and the HellaSwag benchmark (87.53 > 85.5).
|