Update README.md
Browse files
README.md
CHANGED
@@ -57,7 +57,7 @@ The model was trained with compute provided by [HessianAI](https://hessian.ai/)
|
|
57 |
### Hugginface Leaderboard
|
58 |
|
59 |
This models is still an early Alpha and we can't guarantee that there isn't any contamination.
|
60 |
-
However, the average of **
|
61 |
|
62 |
| Metric | Value |
|
63 |
|-----------------------|-------|
|
@@ -67,7 +67,7 @@ However, the average of **72.15** would earn the #2 spot on the HF leaderboard a
|
|
67 |
| TruthfulQA (0-shot) | 61.42 |
|
68 |
| Winogrande (5-shot) | 83.03 |
|
69 |
| GSM8k (5-shot) | 68.39 |
|
70 |
-
| **Avg.** | **
|
71 |
|
72 |
We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard.
|
73 |
|
|
|
57 |
### Hugginface Leaderboard
|
58 |
|
59 |
This models is still an early Alpha and we can't guarantee that there isn't any contamination.
|
60 |
+
However, the average of **73.198** would earn the #1 spot on the HF leaderboard at the time of writing and the highest score for a >70b model yet.
|
61 |
|
62 |
| Metric | Value |
|
63 |
|-----------------------|-------|
|
|
|
67 |
| TruthfulQA (0-shot) | 61.42 |
|
68 |
| Winogrande (5-shot) | 83.03 |
|
69 |
| GSM8k (5-shot) | 68.39 |
|
70 |
+
| **Avg.** | **73.198** |
|
71 |
|
72 |
We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard.
|
73 |
|