codelion leaderboard-pr-bot commited on
Commit
8968e16
1 Parent(s): e0a5eb5

Adding Evaluation Results (#2)

Browse files

- Adding Evaluation Results (1811b87842a5e7872bd2f8b8c55c3a2b10967dfc)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -139,4 +139,17 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
139
 
140
  In addition, to the official Open LLM Leaderboard, the results on OpenLLM Eval have been validated by [others as well (76.59)](https://github.com/saucam/model_evals/tree/main?tab=readme-ov-file#model-eval-results).
141
 
142
- Our own initial eval is available [here (76.37)](https://gist.github.com/codelion/78f88333230801c9bbaa6fc22078d820).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
139
 
140
  In addition, to the official Open LLM Leaderboard, the results on OpenLLM Eval have been validated by [others as well (76.59)](https://github.com/saucam/model_evals/tree/main?tab=readme-ov-file#model-eval-results).
141
 
142
+ Our own initial eval is available [here (76.37)](https://gist.github.com/codelion/78f88333230801c9bbaa6fc22078d820).
143
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
144
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_meraGPT__mera-mix-4x7B)
145
+
146
+ | Metric |Value|
147
+ |---------------------------------|----:|
148
+ |Avg. |75.91|
149
+ |AI2 Reasoning Challenge (25-Shot)|72.95|
150
+ |HellaSwag (10-Shot) |89.17|
151
+ |MMLU (5-Shot) |64.44|
152
+ |TruthfulQA (0-shot) |77.17|
153
+ |Winogrande (5-shot) |85.64|
154
+ |GSM8k (5-shot) |66.11|
155
+