DiscoResearch
/

DiscoLM-120b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

bjoernp commited on Dec 3, 2023

Commit

aaeadf1

·

1 Parent(s): d231233

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -57,7 +57,7 @@ The model was trained with compute provided by [HessianAI](https://hessian.ai/)
 ### Hugginface Leaderboard
 This models is still an early Alpha and we can't guarantee that there isn't any contamination.
-However, the average of **72.15** would earn the #2 spot on the HF leaderboard at the time of writing and the highest score for a >70b model yet.
 | Metric | Value |
 |-----------------------|-------|
@@ -67,7 +67,7 @@ However, the average of **72.15** would earn the #2 spot on the HF leaderboard a
 | TruthfulQA (0-shot)   | 61.42 |
 | Winogrande (5-shot)   | 83.03 |
 | GSM8k (5-shot)   | 68.39 |
-| **Avg.**                  | **72.15** |
 We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard.

 ### Hugginface Leaderboard
 This models is still an early Alpha and we can't guarantee that there isn't any contamination.
+However, the average of **73.198** would earn the #1 spot on the HF leaderboard at the time of writing and the highest score for a >70b model yet.
 | Metric | Value |
 |-----------------------|-------|
 | TruthfulQA (0-shot)   | 61.42 |
 | Winogrande (5-shot)   | 83.03 |
 | GSM8k (5-shot)   | 68.39 |
+| **Avg.**                  | **73.198** |
 We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard.