bjoernp commited on
Commit
aaeadf1
·
1 Parent(s): d231233

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -57,7 +57,7 @@ The model was trained with compute provided by [HessianAI](https://hessian.ai/)
57
  ### Hugginface Leaderboard
58
 
59
  This models is still an early Alpha and we can't guarantee that there isn't any contamination.
60
- However, the average of **72.15** would earn the #2 spot on the HF leaderboard at the time of writing and the highest score for a >70b model yet.
61
 
62
  | Metric | Value |
63
  |-----------------------|-------|
@@ -67,7 +67,7 @@ However, the average of **72.15** would earn the #2 spot on the HF leaderboard a
67
  | TruthfulQA (0-shot) | 61.42 |
68
  | Winogrande (5-shot) | 83.03 |
69
  | GSM8k (5-shot) | 68.39 |
70
- | **Avg.** | **72.15** |
71
 
72
  We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard.
73
 
 
57
  ### Hugginface Leaderboard
58
 
59
  This models is still an early Alpha and we can't guarantee that there isn't any contamination.
60
+ However, the average of **73.198** would earn the #1 spot on the HF leaderboard at the time of writing and the highest score for a >70b model yet.
61
 
62
  | Metric | Value |
63
  |-----------------------|-------|
 
67
  | TruthfulQA (0-shot) | 61.42 |
68
  | Winogrande (5-shot) | 83.03 |
69
  | GSM8k (5-shot) | 68.39 |
70
+ | **Avg.** | **73.198** |
71
 
72
  We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard.
73