bjoernp commited on
Commit
93db18f
1 Parent(s): f7e859a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -61,7 +61,7 @@ The model was trained with compute provided by [HessianAI](https://hessian.ai/)
61
  ### Hugginface Leaderboard
62
 
63
  This models is still an early Alpha and we can't guarantee that there isn't any contamination.
64
- However, the average of **71.24** would earn the #3 spot on the HF leaderboard at the time of writing.
65
 
66
  | Metric | Value |
67
  |-----------------------|-------|
@@ -73,6 +73,12 @@ However, the average of **71.24** would earn the #3 spot on the HF leaderboard a
73
  | GSM8k (5-shot) | 63.68 |
74
  | **Avg.** | **71.24** |
75
 
 
 
 
 
 
 
76
  We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard.
77
 
78
  ### FastEval
 
61
  ### Hugginface Leaderboard
62
 
63
  This models is still an early Alpha and we can't guarantee that there isn't any contamination.
64
+ The following are the scores from our own evaluation.
65
 
66
  | Metric | Value |
67
  |-----------------------|-------|
 
73
  | GSM8k (5-shot) | 63.68 |
74
  | **Avg.** | **71.24** |
75
 
76
+ The model is now also officially ranked on the Open LLM Leaderboard as #6 overall and as the second strongest Llama-2-70b based model (ranking only begind TigerBot 70b):
77
+
78
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62e3b6ab0c2a907c388e4965/0ZIBCnO08tX44ilGcl8Wb.png)
79
+ (Screenshot from the 05. of December 2023)
80
+
81
+
82
  We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard.
83
 
84
  ### FastEval