Pankaj Mathur
commited on
Commit
•
8aded8e
1
Parent(s):
0c3d4df
Update README.md
Browse files
README.md
CHANGED
@@ -26,14 +26,10 @@ Here are the zero shot metrics results.
|
|
26 |
|:------:|:-------------:|:---------:|:--------:|:-------:|:--------:|
|
27 |
|**Task**|**num_fewshot**|**Version**|**Metric**|**Value**|**Stderr**|
|
28 |
|*arc_easy*|0|0|acc|0.7386|0.0090|
|
29 |
-
|*arc_easy*|0|0|acc_norm|0.7066|0.0093|
|
30 |
-
|*hellaswag*|0|0|acc|0.5591|0.0050|
|
31 |
|*hellaswag*|0|0|acc_norm|0.7394|0.0044|
|
32 |
-
|*truthfulqa_mc*|0|1|mc1|0.2938|0.0159|
|
33 |
|*truthfulqa_mc*|0|1|mc2|0.4399|0.0153|
|
34 |
-
|*mmlu
|
35 |
-
|*
|
36 |
-
|*Total Zero Shot Average*|0|-|-|0.5373|0.011|
|
37 |
|
38 |
|
39 |
Here are the results on metrics used by [HuggingFaceH4 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
@@ -43,8 +39,12 @@ please note num_fewshots varies for each below task as used by HuggingFaceH4 Ope
|
|
43 |
|||||||
|
44 |
|:------:|:-------------:|:---------:|:--------:|:-------:|:--------:|
|
45 |
|**Task**|**num_fewshot**|**Version**|**Metric**|**Value**|**Stderr**|
|
46 |
-
|*arc_challenge*|25|0|acc|0.4846|0.0146|
|
47 |
|*arc_challenge*|25|0|acc_norm|0.5077|0.0146|
|
|
|
|
|
|
|
|
|
|
|
48 |
|
49 |
|
50 |
|
|
|
26 |
|:------:|:-------------:|:---------:|:--------:|:-------:|:--------:|
|
27 |
|**Task**|**num_fewshot**|**Version**|**Metric**|**Value**|**Stderr**|
|
28 |
|*arc_easy*|0|0|acc|0.7386|0.0090|
|
|
|
|
|
29 |
|*hellaswag*|0|0|acc_norm|0.7394|0.0044|
|
|
|
30 |
|*truthfulqa_mc*|0|1|mc2|0.4399|0.0153|
|
31 |
+
|*mmlu*|0|1|acc_norm|0.4108|0.0153|
|
32 |
+
|*Total Zero Shot Average*|0|-|-|0.5821|0.011|
|
|
|
33 |
|
34 |
|
35 |
Here are the results on metrics used by [HuggingFaceH4 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
|
|
39 |
|||||||
|
40 |
|:------:|:-------------:|:---------:|:--------:|:-------:|:--------:|
|
41 |
|**Task**|**num_fewshot**|**Version**|**Metric**|**Value**|**Stderr**|
|
|
|
42 |
|*arc_challenge*|25|0|acc_norm|0.5077|0.0146|
|
43 |
+
|*hellaswag*|10|0|acc_norm|0.7617|0.0043|
|
44 |
+
|*mmlu*|5|0|acc_norm|-|-|
|
45 |
+
|*truthfulqa_mc*|0|1|mc2|0.4399|0.0153|
|
46 |
+
|*Total Average*|0|-|-|0.5697|0.0114|
|
47 |
+
|
48 |
|
49 |
|
50 |
|